.NET Collection Madness - Part I

Posted on January 24, 2005  |  

Posted in Development

13 comments

Ever since I began to code for .NET I’ve been confused by the host of collections available in the .NET Framework. On the one hand, there are so many of them in the System.Collections and System.Collections.Specialized namespaces. On the other hand, it’s a pity there’s so little “prescriptive guidance” as to when to use which collection and why.

The other day, while reading Improving .NET Application Performance and Scalability, I came across an interesting discussion of collections. For some reason I got stuck on the explanations. I spent two hours analyzing 10 measly pages on the train, a total of another 2 hours poking around various collections with Reflector, and then another hour re-reading the section and comparing it with MSDN. I simply couldn’t let it go because I realized I was completely confused.

The confusion stems from inconsistent naming of collections, strange design of some of them and misleading advice on MSDN. I’d like to share my perspective on .NET collections. Please, feel free to state your opinion, observations and experiences.

Terminology

I think the legs of confusion grow from terminology. In .NET collections are referred to as collections, lists, dictionaries. These terms are—at times—used interchangeably.

For example: SortedList. By definition, it represents a collection of key-and-value pairs that are sorted by the keys and are accessible by key and by index. You add a pair which makes it a dictionary, not a list. Internally it maintains an array of keys and an array of values, but on the outside it’s a dictionary. I’ll get back to this point later.

Another good example is the NameValueCollection class. MSDN states that it represents a sorted collection of associated String keys and String values that can be accessed either with the key or with the index. I don’t believe this collection has anything to do with sorting, and I guess I’m not alone in this.

I think it would’ve made more sense if collections were defined as follows:

  • Collection: any data container in general, be it a list or a dictionary.
  • List: a collection of objects that can be individually accessed by index, exactly as the IList interface is defined.
  • Dictionary: a collection of key-and-value pairs, exactly as the IDictionary interface is defined.

Even though definitions of ICollection, IList and IDictionary are accurate, naming of classes and their functionality aren’t. As I pointed out, SortedList is out of place.

Let’s review some of the collections in common use.

ArrayList

This one is a universal soldier. It’s mission is to store items of type Object in an internal list, which means you can store anything in it. You can even add null. As elements are added, it resizes itself. Initial capacity defaults to 16.

Duplicates

Yes, tolerates duplicates.

Lookup by index

Very fast. All it does it pull out an element from an internal array by position.

Lookup by value

IndexOf performs a linear search, therefore, the average execution time is proportional to Count which is slow.

Contains performs a linear search, therefore, the average execution time is proportional to Count which is slow, too. Same definition as above, different internal implementation.

BinarySearch is faster and is recommended for efficient searches. Interestingly enough, BinarySearch can conduct case-insensitive searches if you pass it CaseInsensitiveComparer:

// Culture-agnostic search
list.BinarySearch (CaseInsensitiveComparer.DefaultInvariant)

// Culture-specific search
CaseInsensitiveComparer.Default

Here’s something that bit me today: if you pass an instance of IComparer to BinarySearch you need to sort the collection first, as MSDN points out:

If comparer is provided, the elements of the ArrayList are compared to the specified value using the specified IComparer implementation. If the ArrayList is not already sorted according to the sort order defined by comparer, the result might be incorrect.

The above sample code should look like this:

// Culture-agnostic search
list.Sort (CaseInsensitiveComparer.DefaultInvariant);
list.BinarySearch ("something", 
    CaseInsensitiveComparer.DefaultInvariant)

// Culture-specific search
list.Sort (list.BinarySearch);
list.BinarySearch ("something", 
    CaseInsensitiveComparer.Default);

Sorting

Sorting is available via the Sort method. Remember the little gotcha with ICompare (see above). Frequent sorting is expensive. It’s recommended that you make several changes in a batch and then resort the collection.

Implications

If the number of elements in the list reaches its capacity, it doubles the capacity, allocates a bigger buffer and copies all elements to it. Therefore, try to preallocate to the desired size when you know it in advance to avoid reallocations and copying.

Another trouble spot is boxing. Every time you store a value type in an ArrayList it’s boxed. If you don’t know what boxing is you really should. Read Eric Gunnerson’s Nice box. What’s in it? and Open the box! Quick! MSDN has a neat example of how to build a strongly typed collection and avoid boxing. It’s ironic that you have to build one yourself.

There’s so much to say about each collection, but I have to stop somewhere. Let’s move on and review ArrayList’s close relative StringCollection.

StringCollection

The mission of this class is to manipulate a list of string. Improving .NET Application Performance and Scalability calls it a strongly typed ArrayList.

The class used ArrayList as its element storage, therefore it closely resembles ArrayList: it’s flexible to frequent changes (beware of memory relocations, though), not so fast on sorting, provides a fast lookup by index, slow lookup by value (same ol’ IndexOf and Contains).

Ironically, StringCollection has no analoge of a fast search BinarySearch-style! Both IndexOf and Contains use slow linear search. This gaping hole was somewhat shocking to me. It pretty much renders this class useless. You design a class to manipulate lists of strings, simply delegate it to ArrayList and cut back on features!

I’m sure class library designers had something in mind, and I wonder what it was. Perhaps it prevents typecasting of elements, but the tradeoff is not fair.

Exotic Collections

Quite honestly, I don’t want to cover exotic collections, such as BitArray, Queue, Stack.

Custom Collections

To learn how to develop your own collections which avoid type casting overhead see Walkthrough: Creating Your Own Collection Class.

To Be Continued

In my second part of .NET Collection Madness I’ll review some of the dictionary-type classes, such as Hashtable, ListDictionary, HybridDictionary, etc.

13 comments

James Curran
on January 25, 2005

Just to clarify something you wrote.

To use BinarySearch, the collection must be sorted, using the same Comparer as the search uses.

If you use the default comparer with BinarySearch, then the collection must still be sorted using that.

A binary search requires a sorted array. That's just the nature of the beast.


amanda
on April 10, 2007

I guess I'm in the same boat. I understand collections at the rudimentary level. I can use them. And I do. But I can't say that I use them well, or even, or the right reasons. I'd love to see some examples. Possibly some examples using the best approach for each type of collection. I know this is a lot of work. I've spent a ton of time wondering around the internet trying to get my head around this. I'd be grateful for just some resources that someone might be able to offer... A book, an article, etc.


Enio
on July 12, 2007

I'm new in .Net world, and I need a collection that not allows duplicates. (Like a Set in Java).

Do you know if theres a collection like this in .Net ?

Thanks,


Jeff
on August 14, 2007

Your comments about confusing MSDN documentation to me sums up all MSDN documentation and implementations. Thank you, thank you, thank you for helping clear up the confusion, and for admitting there is confusion. I'm fairly new to .net and coming from delphi where everything is clear, with lots of simple examples, its been a frustrating .


Milan Negovan
on August 15, 2007

I think MSDN is generally very good. I always install the latest MSDN collection for offline use. It's very handy.

Collections, in particular, haven't been designed that well, so I imagine documentation writers have a hard time explaining the subject.


David Friedland
on September 6, 2007

I agree. Why are there so many??

But what I NEED to know and I'm trying uber-hard to find out is: What's an Unsorted SortedList?

I have a list (or dictionary if you want to get technical) that is currently implemented using a SortedList. Now the list that fills it need to be in the order it was added, not alphabetical.

I tried a Collection - no luck, when my function returns the collection to the dropdownlist's bind event, it bombs. (because there's an issue with the Key/Value elements of the collection
" Me.ddlState.DataTextField = "Value"
Me.ddlState.DataValueField = "Key""
And there appears to be an extra element in my collectiion that doesn't have either property, it's simply a string named "Empty placeholder to adjust for 1-based array." which I can't remove.

I also tried an ArrayList, but that bombed even earlier when I tried adding 2 values to it: "states.Add(tmpElement.Attributes("code").Value, tmpElement.Attributes("name").Value)"

Please Help!


Craig
on September 20, 2007

I agree. Here I am, trying to implement some sort of collection of entities and previously, using VB6 (I code in c# now) I would create a custom class, code the add method to prevent duplicates. Now, years later and moving toward .NET 3.5 when everything is supposed to be easier - I can't for the life of me understand why this is so complicated. Surely, a collection should exist that checks duplicates? Why must I use a dictionary? I don't want a dictionary, I want a collection. I want the collection to know that this entity exists already and throw an exception. Why not? Collection.Contains returns true so it knows it's a duplicate. So many namespaces and naming conventions is all so frustratingly stupid. These things should be straightforward so I can focus on the business logic ffs.


Phebe
on February 11, 2009

Hi David Friedland,
If you need a list that needs to be in the order it was added and not alphabetical, you could try using the
NameValueCollection class.


Steve
on May 17, 2009

The MSDN example of building a strongly-typed collection still has an asignment:

List[index] = value;

Surely the Int16 value will be boxed to put it into the list. If this is not the case then why is it necessary to use a cast to extract it again?


Milan Negovan
on May 21, 2009

Steve, unless it's a generic collection, the value will be boxed. Hence the unfortunate cast.


Darryl
on December 21, 2009

Thanks for this information. I agree with others about the poor design and documentation of the MS collection classes.

I think that the heart of the problem is that a lot of MS designers and writers lack real world commercial development experience. That is why designs often have fancy features that will probably never be needed in most commercial applications and at the same time lack essential features.

I think that MS software has become far too complex and obscure - this actually harms the IT industry rather than helping it. Far too much time and energy is consumed trying to understand (and work around problems) in MS software and that leaves too little left over for designing and building the solution itself.


trasloco milano
on December 24, 2009

Wow, I never knew that .NET Collection Madness - Part I. That's pretty interesting...


naval
on February 25, 2010

its a very gtood article. i think we have to understand the base of any technologies and methodlogy,