Understanding the nuances of C# Collections

This article is based on .NET 9 and C# 13. I aim to cover the behaviors of common collections within the System.Collections.Generic namespace.

Insertion behaviors in Dictionary

The Dictionary<TKey, TValue> provides a mapping from a set of keys to a set of values. Values can be inserted into a dictionary with varying behaviors. Insertion in a Dictionary is implemented by the private method TryInsert. This method accepts an additional parameter behavior, which is an enum with values IgnoreInsertion, OverwriteExisting, and ThrowOnExisting. Consider the following code, where each operation behaves differently:

Dictionary<int, int> dict = new Dictionary<int, int>();
dict[1] = 3; // Calls underlying private method TryInsert with behavior OverwriteExisting
Debug.Assert(dict[1] == 3);
dict.TryAdd(1, 5); // Calls underlying private method TryInsert with behavior IgnoreInsertion
Debug.Assert(dict[1] == 3);
dict.Add(1, 4);  // Calls underlying private method TryInsert with behavior ThrowOnExisting

The Curious case of 'Capacity' and 'Count'

The Dictionary also exposes two properties: Count and Capacity. Count denotes the number of key-value pairs in the dictionary, and Capacity denotes the size of the underlying container that holds the dictionary. The Capacity property was only exposed from .NET 9 and is not available in older versions of .NET. We can specify the value for Capacity in the constructor: Dictionary<int, int> dict = new Dictionary<int, int>(5);, which will create an underlying container of size 7. Why size 7? Initially, I was puzzled too, so I checked the source code of the Dictionary implementation.

The size is 7 because .NET uses a prime array, and the code selects the smallest prime number from the array that is larger than the given capacity. Here is the prime array for reference:

internal static ReadOnlySpan&lt;int&gt; Primes =&gt;
    [
        3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, 761, 919,
        1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
        17519, 21023, 25229, 30293, 36353, 43627, 52361, 62851, 75431, 90523, 108631, 130363, 156437,
        187751, 225307, 270371, 324449, 389357, 467237, 560689, 672827, 807403, 968897, 1162687, 1395263,
        1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369
    ];

Now, with the size being 7, when we insert the 8th item into the Dictionary, .NET needs to resize the underlying container. For resizing, .NET first computes twice the current count and then finds the first prime number from the array that is larger than this value. In this example, it is 7 * 2 = 14, and the next prime in the list is 17. Therefore, after inserting the 8th item, the capacity of the dictionary will be 17.

Also, if you know the size of the data in advance, specifying the capacity can provide a minor performance boost. Please see the benchmark results below for Dictionary insertion.

Gist: Benchmark results for Dictionary insert

HashSet and OrderedDictionary also behave in the same way with respect to capacity and resizing.

For Stack<T>, Queue<T>, SortedList<TKey, TValue>, and List<T>, resizing is not based on the prime array. In these cases, the container is created with the given capacity in the constructor, and for resizing, the size is simply doubled. Consider the example: Stack<int> ints = new Stack<int>(5);. After pushing the 6th item, the size is doubled to 10.

PriorityQueue<TElement, TPriority> works similarly to Queue<T> and List<T>, but the property Capacity is not yet exposed.

Final Note

Understanding the inner workings of collections in C# can greatly enhance your ability to write efficient and effective code. As we've seen, the Dictionary<TKey, TValue> class in .NET 9 has some interesting behaviors, particularly with how it handles capacity and resizing. By leveraging the prime number array for sizing, .NET ensures that the hash table's performance remains optimal.

Remember that specifying the initial capacity can lead to performance gains, especially when the size of the dataset is known beforehand. This is because it minimizes the number of resizes that need to occur as the collection grows.

While Dictionary<TKey, TValue>, HashSet<T>, and OrderedDictionary use prime numbers to determine their capacities, other collections like Stack<T>, Queue<T>, SortedList<TKey, TValue>, and List<T> follow a simpler doubling strategy. This knowledge allows you to predict and control the memory usage and performance characteristics of your collections more accurately.

Happy coding!