.NET Collection Madness - Part II

Posted on February 23, 2005  |  

Posted in Development

13 comments

In my previous post with the same title I alluded to confusion I encountered when I started poking around collections in the .NET Framework. I guess the problem with them is to figure out what each of them was created for and understand their benefits and drawbacks.

The previous post was mostly about lists. This time around let’s review some dictionary-type collections.

NameValueCollection

This is a really weird one. As MSDN states, it represents a sorted collection of associated String keys and String values that can be accessed either with the key or with the index.

First, there’s nothing sorted about it. Under the hood it uses a Hashtable which, by definition, is not suitable for sorting. I guess you can call AllKeys, which returns an array of strings, and sort it then, but I believe documentation is wrong on this account.

Second, the way you store anything in NameValueCollection is by passing a string name/string value pair, which makes it a dictionary. It does allow duplicates because the underlying storage is a hashtable of dictionary entries and every key has an ArrayList of values.

Another oddity about this class is that it doesn’t expose SyncRoot which pretty much every other collection does. MSDN simply suggests rolling your own wrapper of NameObjectCollectionBase and get to SyncRoot that way. I needed a way to synchronize access to this kind of collection in my recent DICT Client project. I didn’t want to go that far with a custom wrapper, so I dumped this class.

I must say, though, NameValueCollection is recommended for management of frequently changing data. This class is also useful for fast retrieval.

One peculiar thing about this collection (something I haven’t seen a mention of) is if you stored several values under one key, you get them back as a comma-delimited string!

public virtual string Get(string name) 
{
 ArrayList list1 = (ArrayList) base.BaseGet(name);
 return NameValueCollection.GetAsOneString(list1);
}

private static string GetAsOneString(ArrayList list)
{
 int num1 = (list != null) ? list.Count : 0;

 if (num1 == 1)
  return (string) list[0];
 
 if (num1 <= 1)
  return null;
 
 StringBuilder builder1 = new StringBuilder((string) list[0]);

 for (int num2 = 1; num2 < num1; num2++)  {
  builder1.Append(',');
  builder1.Append((string) list[num2]);
 }

 return builder1.ToString();
}

I wish they added an extra space for readability, but it’s still pretty cool.

SortedList

This is quite an interesting class. In fact, the only class which sorts automatically. Its startup time is relatively high because it needs to sort items up front, but after that retrieval works really fast. Internally it uses BinarySearch, which we already know is fast, and as James Curran pointed out, the collection must be sorted in either case for BinarySearch to do its work right. IndexOf works fast too because it delegates lookup to BinarySearch.

This class is not recommended for storing frequently changing data for obvious reasons (need to resort). If your data changes often, put it in an ArrayList instead and run its Sort method.

Again, SortedList is defined at MSDN as a collection of key-and-value pairs that are sorted by the keys and are accessible by key and by index, which—in essence—makes it a dictionary, not a list.

Overall this is a very helpful class with a slew of methods to retrieve all values or all keys, see if a certain value and/or key exists, etc.

StringDictionary

By definition, this is a strongly typed Hashtable which allows you to store string key/string value pairs. The benefit is that is saves a typecast from Object to String when you read something back.

This class is weird in its own way. When you add a key/value pair, it converts the key to lower case:

public virtual void Add(string key, string value)
{
 this.contents.Add(
   key.ToLower(CultureInfo.InvariantCulture), value);
}

Again, in my DICT Client project I needed to get a list of all keys, but having realized they were miraculously transformed to lower case I dumped StringDictionary. This is a really weird way to prevent insertion of duplicates. Especially since its storage is a plain Hashtable.

Case Insensitive Hashtable

I swear I didn’t know you could create a Hashtable which ignored casing of its keys. Almost every book out there harps on the fact that keys are case sensitive. Creating one is no rocket science (see Performing Culture-Insensitive String Operations in Collections). As an added bonus, this link also demonstrates how to create a case-insensitive comparer for a SortedList.

If you work with “tr-TR” (Turkish in Turkey) and “az -AZ-Latn” (Azeri (Latin) in Azerbaijan) cultures, I strongly suggest you read through the entire section Performing Culture-Insensitive String Operations.

Hashtable, HybridDictionary and ListDictionary

I think the name of ListDictionary is the worst one in the entire collection namespace. Is it a list? Or a dictionary? Both?? It’s a very mediocre name which tells you nothing. Open it up in Reflector and you learn it’s a linked list.

MSDN suggests you shouldn’t store more than 10 entries in it for “optimal” performance. Its methods are O(n) operations much like several other collection classes, so why 10? Linked lists don’t perform that bad so why blacklist them? I can’t see any indication why 10 is the limit, so if anyone knows, please speak up.

HybridDictionary is a chameleon class which starts off faking a ListDictionary (no more than 10 PT Cruisers per customer, please), and over 10 entries it turns into a Hashtable. HybridDictionary even has a private method called ChangeOver to kick off this mutation.

What beats me is why you can’t write one class, not three, and build all this logic into it.

Conclusion

I wish I could just draw a flow chart to help you make decisions about which collection to choose. But I can’t. In fact, every time I’m faced with such a decision, I have headaches. .NET collections are written in such a way that you’re left with a feeling the friendly car salesmen ripped you off instead of giving you the best deal. All I could do was outline some strange nuanses.

Back in my C++ days I used the Standard Type Library (STL) for years. STL, written by the leading industry specialists, is a life saver of every C++ developer. You can find several excellent books on STL by big-name authors and master it quite easily.

It’s a shame .NET collections are nowhere like STL. If you make a language so similar to C++ and Java, why not reuse the best? With the advent of generics in 2.0 I’m hoping collections will become easier to comprehend. Until then…

Happy hacking!

13 comments

Damien McGivern
on February 24, 2005

I believe even Microsoft devs have acknowledge that they mucked up Collections in 1.x which is why there so many breaking changes in 2.0


Bill V
on April 25, 2005

This is definitely the best article I've ever read on .NET Collection, which i never managed to grasp until today.

In my 3 years .NET development work, collections have always been a headache: every time i need to make a decision about which existing collection to use, i almost always ended up creating my own.

The collections that i often use are:
- ArrayList
- NameValueCollection


PeterB
on August 9, 2005

I appreciate the posts, too. And you can rest assured that you aren't the only one who finds .NET 1.x's collections a maze of confusion. I'd just like to add that MS guidance also recommends using arrays in preference to collections unless you absolutely need the special functionality provided by a collection.


Pravin Paratey
on August 14, 2005

This is one of the best articles I've come across. In one page I learnt all that I wanted to about collection classes. Thanks :)


Miha Markic
on October 10, 2005

Hi Milan,

Great overview on collections. I might add one piece of the puzzle - about NameValueCollection (my guesses):
- its main purpose is to be used in asp.net (you see it on lot of places)
- that's why it doesn't implement SyncRoot
- that's why it can create a comma delimited list


Eric Smith
on January 31, 2006

Thanks for this article and to all posters. I'm in the progress of migrating chunks of VB6 to .net. I'm kind of new to the .net world and this link popped up when I was trying to find any article on when to use one collection versus another. Every little bit of information helps!


Andreas
on March 1, 2006

Great article! Too bad you couldn't draw up the "Collection Selection Flow Chart for Dummies" - it would easily have made it to my cubicle wall...


Dave
on March 10, 2006

Just today I was making a class to parse command line arguments for C# CLI apps (like optargs, but better). I wanted to give the Parse method an IDictionary to be able to initialize the set of parsed arguments with defaults. Simple enough, except the one thing that you'd want to initialize it with the most is ConfigurationSettings.AppSettings...which isn't an IDictionary. Just one of many bad design decisions in the C# API I'm afraid...


Jon
on January 31, 2007

The "sortedness" of the NameValueCollection comes from the fact that values can be retrieved by index, not because they are numerically or lexicographically sorted.

See the section on SortedList or MSDN for OrderedDictionary.


shrike
on March 24, 2007

Good times! Followed this from Rick Strahl's post Fun with List Types for Caching Data where you linked over.

Good article, brings back 1.0 memories. One quibble though, why do you dodge on Queues and Stacks? I've always found them to be very useful.


Milan Negovan
on March 26, 2007

I didn't really have any gripes with Queue and Stack. It's the other collections that give me headaches.


Silvana
on June 21, 2007

Excellent article! It's the best overview on NET Collections I've ever come across. But there's one thing I couldn't find. I'm in a process of migrating a Java proyect to a .NET (C#) one. And there is a special kind of Linked list that I could'nt find its equivalent in NET. It's LinkedHashSet. It has its special features but there's one which is the most important: it doesn't permit duplicates and that's great. But all lists I found in .NET allow duplicates and this's causing me a lot of headaches.
Does anybody know which kind of linked list I could use which does not permit duplicates?
Thanks a bunch!


Anonymous
on May 17, 2008

The information in this article is deprecated by generic collections. Here's a chart that shows a more modern way to use collections.

http://www.istcorporate.com/DotNetCollectionChart.htm