Mar152010

HashSet

    It is been sometime I start using and enjoying HashSet as a collection in C#. Especially if you are using NHibernate as an ORM solution, you will see HashSet in one-to-many relations in your POCO objects. 
HashSet<T> lives at System.Collections.Generic namespace. It is a collection of T that you can add/remove/contains etc elements but one of the most important feature of it is, having only the unique elements. So simply if you already have “1” in the collection, and try to add the same number again, it simply won’t be added to the collection. The collection is not ordered, and uses hash algorithm the complexity of add/remove/contains is O(1) (if you don’t know what it is, you can not be faster than this). Let’s work on some sample codes and check the behavior of this collection.
In our first application, let’s create a HashSet<int> , add some numbers and print them to the console.

HashSet<int>numbers = new HashSet<int>(); numbers.Add(1); numbers.Add(3);
numbers.Add(4); numbers.Add(2);
numbers.Add(1); numbers.Add(3);

foreach (var number in numbers)
{
  Console.Out.WriteLine("{0}",number);
}

 

In the code above, I created a HashSet<int> added the numbers, 1,3,4,2,1,3, then iterated through the collection and printed them to the console.The output is: “1,3,4,2” as the other repetitive numbers are not added to the collection. Actually Add<T>() returns a boolean value. If the value is added to the collection it returns true, else false.  How about when we have reference types instead of values types. This time let’s create the class “Person” with 2 simple property, and add it to the Hash collection.

Person person1 = new Person {ID = 1, Name = "person1"};
Person person2 = new Person {ID = 1, Name = "person1"};
Person person3 = new Person {ID = 2, Name = "person1"};
HashSet<Person> persons = new HashSet<Person>();
persons.Add(person1);
persons.Add(person2);
persons.Add(person3);
foreach (var person in persons)
{
   Console.Out.WriteLine(person);
}
.....
public class Person
{
   public int ID { get; set; }
   public string Name { get; set; }
   public override string ToString()
   {
        return String.Format("{0}-{1}", ID, Name);
   }
}

In this example I created 3 person objects, and 2 of them has the exact same values for the properties, added them to the collection, and print all of them to the console. The result is we have all 3 objects in the collection. When we called Add<person> the hash collection, checks if the value exists in the collection or not. The way it works, is, it uses the default IEqualityComparer to figure out the equality. In our sample, as we newed up all the objects, they are all different objects, so how can we change the code that, if the value of the ID fields are same, the objects are treated as the same object, and won’t be added to hash again. You may already guess the answer, we have to write an IEqualityComparer for our person object, and pass this to the hash collection. Here is the new updated code:

Person person1 = new Person {ID = 1, Name = "person1"};
Person person2 = new Person {ID = 1, Name = "person1"};
Person person3 = new Person {ID = 2, Name = "person1"};
HashSet<Person> persons = new HashSet<Person>(new PersonComparer());
persons.Add(person1);
persons.Add(person2);
persons.Add(person3);
foreach (var person in persons)
{
   Console.Out.WriteLine(person);
}
....
public class PersonComparer:IEqualityComparer<Person>
{
   public bool Equals(Person x, Person y)
   {
       return x.ID == y.ID;
   }
    public int GetHashCode(Person obj)
    {
       return obj.ToString().GetHashCode();
     }
}
...
public class Person
{
   public int ID { get; set; }
   public string Name { get; set; }
   public override string ToString()
   {
        return String.Format("{0}-{1}", ID, Name);
    }
 }

 

This way, we can still ensure that all reference typed objects are unique too in the collection.

As HashSet is a set :), you can use some of the set functions too, and LINQ has 2 interesting extensions methods: Union and Intersect. Here is a sample code that uses these two extension methods:

HashSet<int> numbers1 = new HashSet<int> {1, 2, 3, 4, 5, 6, 7};
HashSet<int> numbers2 = new HashSet<int> {10, 9, 8, 7, 6, 5, 2};

IEnumerable<int>intersect = numbers1.Intersect(numbers2);
IEnumerable<int> union = numbers1.Union(numbers2);

As you can see, HashSet is really powerful, and useful.



Tags: ,

E-mail | Permalink | Trackback | Post RSSRSS comment feed 0 Responses

Add comment