In CoreFoundation, there were a tiny number of all-singing-all-dancing collections which tried to intuit what you were using them for, and pick the correct data structure internally: http://ridiculousfish.com/blog/posts/array.html

Swift isn’t like this. Swift has a much larger number of collections, and it’s up to the programmer to pick the right one for the job.

In Swift, if you want a set of contiguous ints, the right data structure is IndexSet. It’s super-fast at this test: way less than 1ns/el for all operations, and a build/copy ratio of pretty close to 1 (between 0.4 and 1.5, on my workstation).

The next question is: why is Set especially slow for this case? Why does it run into linear probing issues at all? Ints in Swift return different hashValues so it’s not obvious why they would collide at all.

A Set created with no initial value or minimumCapacity will be as small as possible. (The source isn’t available, but one place I found says 1 bucket, and another says 0.) It’s hitting collisions all the time because it’s creating a tiny hash table, then as you add more elements it needs to copy that to a bigger hash table, and so on, ad infinitum. When you add a million items, one at a time, it doesn’t know until the last one that you intended to add a million items all along.

Sure enough, if I change the Set() inits to Set(minimumCapacity: 2*size), the code runs drastically faster, and the copy/build ratio is under 1.0 in all cases.

No hash table implementation can grow indefinitely with no performance hit. Either it needs to start out big (and waste memory, if you only needed a few elements), or it needs to copy itself as it grows (and waste time, if you know you need a lot). You can solve this by telling it to start out big.

You’re seeing quadratic time not because of linear probing, but because of bucket copying.

I used an Int data type just to keep things simple, but the problem is general.

Ints in Swift return different hashValues so it’s not obvious why they would collide at all.

It does not matter that they are integers except for the fact that it is a lot easier to benchmark (since at the core, it is much faster).

Modify my code, it will take you seconds… just cast the Ints to Strings. The ratio blows up just the same.

A Set created with no initial value or minimumCapacity will be as small as possible. (…) You’re seeing quadratic time not because of linear probing, but because of bucket copying.

I agree that if you set the capacity high enough, the problem will go away. It is one of several ways to you can solve this problem.

But in my code, I build two sets starting from an empty set. It is the same problem. The only difference in the second case is the order in which I build it.

Timsays:

“But in my code, I build two sets starting from an empty set. It is the same problem. The only difference in the second case is the order in which I build it.”

Then why, according to your theory, would linear probing affect one, but not the other? Shouldn’t inserting values in either order be just as susceptible to the linear probing penalty? What’s special about the second ordering that makes it slow? Inserting them backwards is still fast, and so is inserting odds-then-evens.

The “1000” is a red herring. It doesn’t affect performance at all. Even the cost of iterating over the first Set is negligible. As you say, this is purely about the order in which items are inserted into a Set.

“just cast the Ints to Strings. The ratio blows up just the same”

I assume you mean “init”, not “cast” (which would throw an exception and abort the process), and no, the ratios don’t blow up in the same way for me. The first few are 5.17, 7.09, 7.07, 24.59, 18.41. It’s generally going up, but it’s not quadratic, or even monotonic. Sometimes when the data size doubles, it goes the same speed.

For that matter, the Int version doesn’t show quadratic behavior, either. Why is the n=4M case consistently so much faster than the n=2M case? Why is the n=32M case almost exactly the same speed as the n=16M case? This doesn’t look like any quadratic algorithm I’ve ever seen. I graphed the powers-of-two from n=1M to n=64M and they appear to level off.

I’m seeing a lot of interesting hypotheses, but no smoking gun yet. You extrapolated from 2 data points to “quadratic”, and “linear probing is the culprit”. But when I ran the same test, incrementing |size| by 10% each time, rather than 100%, I got this:

There’s clearly a lot more going on than just “quadratic complexity”, or that you might guess from looking at 2 points. Half the time, the “rebuild” is slower, but half the time it’s the same speed (or slightly faster). When n=14.4M, for example, the rebuild rate is over 1000 times *faster* than when n=8.95M, despite having over 60% more data.

It’s true that order of insertion matters, but I don’t think there’s enough information yet to pinpoint the cause of this. This is a fascinating puzzle you’ve stumbled on!

Then why, according to your theory, would linear probing affect one, but not the other?

Because in the second case, we get an adversarial insertion order.

Shouldn’t inserting values in either order be just as susceptible to the linear probing penalty?

I did not refer to a linear probing penalty. Linear probing is fast and efficient, except when it fails.

What’s special about the second ordering that makes it slow?

It is special because it is aware of the hash values.

The â€œ1000â€ is a red herring.

It is not necessary to get this effect, you are right, but without this step, I was sure to get a comment to the effect that it is trivial to copy a set quickly without a for-loop.

I assume you mean â€œinitâ€, not â€œcastâ€ (which would throw an exception and abort the process)

I used the term “cast” as short for “type casting”:

In computer science, type conversion, type casting, and type coercion are different ways of changing an entity of one data type into another. An example would be the conversion of an integer value into a floating point value or its textual representation as a string, and vice versa. https://en.wikipedia.org/wiki/Type_conversion

You must be thinking of something else.

The first few are 5.17, 7.09, 7.07, 24.59, 18.41. It’s generally going up, but it’s not quadratic, or even monotonic.

I did not get to the notion of quadratic complexity by looking at the numbers. It is very hard to establish the computational complexity of a problem by looking at the actual speed.

Java 8’s HashMap also added an improvement where, above a certain size, the colliding entries form a balanced tree instead of a linked list (see http://openjdk.java.net/jeps/180), improving the worst case scenario to O(log n).

Arjun Menonsays:

This has to do with cache locality. Normally Build and Reinsertion should _not_ have different performance characteristics. In both cases, you are inserting elements. In Build you start with the empty set. In Reinsertion you start with a set with a few elements. Theoretically, if you assume that iterating/reading a set is fast, there would be no difference, since they are doing the same thing (“insert”).

The performance difference arises because reading from another set before each insertion, results in more cache misses. With such large data sets, the cache is being completed utilized, and reading from another region in memory before each write means more stuff be cached. Thus, more cache misses. That’s what going on here.

In Build you start with the empty set. In Reinsertion you start with a set with a few elements.

As pointed out in other comments, this is not actually relevant. I started the second phase with an element to avoid comments to the effect that I am copying the set in a stupid manner.

You can start with empty sets in both instances, it won’t change the result.

Theoretically, if you assume that iterating/reading a set is fast, there would be no difference, since they are doing the same thing (â€œinsertâ€).

There is no theoretical result to says that insertion order is irrelevant.

The performance difference arises because reading from another set before each insertion, results in more cache misses. With such large data sets, the cache is being completed utilized, and reading from another region in memory before each write means more stuff be cached. Thus, more cache misses. That’s what going on here.

If you look at the actual benchmarking code, on GitHub, you will see that both read their data from arrays. The only difference is the insertion order.

Timsays:In CoreFoundation, there were a tiny number of all-singing-all-dancing collections which tried to intuit what you were using them for, and pick the correct data structure internally: http://ridiculousfish.com/blog/posts/array.html

Swift isn’t like this. Swift has a much larger number of collections, and it’s up to the programmer to pick the right one for the job.

In Swift, if you want a set of contiguous ints, the right data structure is IndexSet. It’s super-fast at this test: way less than 1ns/el for all operations, and a build/copy ratio of pretty close to 1 (between 0.4 and 1.5, on my workstation).

The next question is: why is Set especially slow for this case? Why does it run into linear probing issues at all? Ints in Swift return different hashValues so it’s not obvious why they would collide at all.

A Set created with no initial value or minimumCapacity will be as small as possible. (The source isn’t available, but one place I found says 1 bucket, and another says 0.) It’s hitting collisions all the time because it’s creating a tiny hash table, then as you add more elements it needs to copy that to a bigger hash table, and so on, ad infinitum. When you add a million items, one at a time, it doesn’t know until the last one that you intended to add a million items all along.

Sure enough, if I change the Set() inits to Set(minimumCapacity: 2*size), the code runs drastically faster, and the copy/build ratio is under 1.0 in all cases.

No hash table implementation can grow indefinitely with no performance hit. Either it needs to start out big (and waste memory, if you only needed a few elements), or it needs to copy itself as it grows (and waste time, if you know you need a lot). You can solve this by telling it to start out big.

You’re seeing quadratic time not because of linear probing, but because of bucket copying.

Daniel Lemiresays:I used an Int data type just to keep things simple, but the problem is general.

Ints in Swift return different hashValues so it’s not obvious why they would collide at all.It does not matter that they are integers except for the fact that it is a lot easier to benchmark (since at the core, it is much faster).

Modify my code, it will take you seconds… just cast the Ints to Strings. The ratio blows up just the same.

A Set created with no initial value or minimumCapacity will be as small as possible. (…) You’re seeing quadratic time not because of linear probing, but because of bucket copying.I agree that if you set the capacity high enough, the problem will go away. It is one of several ways to you can solve this problem.

But in my code, I build two sets starting from an empty set. It is the same problem. The only difference in the second case is the order in which I build it.

Timsays:“But in my code, I build two sets starting from an empty set. It is the same problem. The only difference in the second case is the order in which I build it.”

Then why, according to your theory, would linear probing affect one, but not the other? Shouldn’t inserting values in either order be just as susceptible to the linear probing penalty? What’s special about the second ordering that makes it slow? Inserting them backwards is still fast, and so is inserting odds-then-evens.

The “1000” is a red herring. It doesn’t affect performance at all. Even the cost of iterating over the first Set is negligible. As you say, this is purely about the order in which items are inserted into a Set.

“just cast the Ints to Strings. The ratio blows up just the same”

I assume you mean “init”, not “cast” (which would throw an exception and abort the process), and no, the ratios don’t blow up in the same way for me. The first few are 5.17, 7.09, 7.07, 24.59, 18.41. It’s generally going up, but it’s not quadratic, or even monotonic. Sometimes when the data size doubles, it goes the same speed.

For that matter, the Int version doesn’t show quadratic behavior, either. Why is the n=4M case consistently so much faster than the n=2M case? Why is the n=32M case almost exactly the same speed as the n=16M case? This doesn’t look like any quadratic algorithm I’ve ever seen. I graphed the powers-of-two from n=1M to n=64M and they appear to level off.

I’m seeing a lot of interesting hypotheses, but no smoking gun yet. You extrapolated from 2 data points to “quadratic”, and “linear probing is the culprit”. But when I ran the same test, incrementing |size| by 10% each time, rather than 100%, I got this:

https://imgur.com/a/AmA3r

There’s clearly a lot more going on than just “quadratic complexity”, or that you might guess from looking at 2 points. Half the time, the “rebuild” is slower, but half the time it’s the same speed (or slightly faster). When n=14.4M, for example, the rebuild rate is over 1000 times *faster* than when n=8.95M, despite having over 60% more data.

It’s true that order of insertion matters, but I don’t think there’s enough information yet to pinpoint the cause of this. This is a fascinating puzzle you’ve stumbled on!

Daniel Lemiresays:Then why, according to your theory, would linear probing affect one, but not the other?Because in the second case, we get an adversarial insertion order.

Shouldn’t inserting values in either order be just as susceptible to the linear probing penalty?I did not refer to a linear probing penalty. Linear probing is fast and efficient, except when it fails.

What’s special about the second ordering that makes it slow?It is special because it is aware of the hash values.

The â€œ1000â€ is a red herring.It is not necessary to get this effect, you are right, but without this step, I was sure to get a comment to the effect that it is trivial to copy a set quickly without a for-loop.

I assume you mean â€œinitâ€, not â€œcastâ€ (which would throw an exception and abort the process)I used the term “cast” as short for “type casting”:

You must be thinking of something else.

The first few are 5.17, 7.09, 7.07, 24.59, 18.41. It’s generally going up, but it’s not quadratic, or even monotonic.I did not get to the notion of quadratic complexity by looking at the numbers. It is very hard to establish the computational complexity of a problem by looking at the actual speed.

Arseny Kapoulkinesays:The number of times people rediscover this issue with linear probing never fails to amaze me.

First in Delphi (the article is only available in Russian, but Google Translate works okay): https://habrahabr.ru/post/282902/

Then in Rust: http://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion

Now in Swift.

Daniel Lemiresays:Thanks for the excellent comment.

Viktor SzathmÃ¡rysays:Java 8’s HashMap also added an improvement where, above a certain size, the colliding entries form a balanced tree instead of a linked list (see http://openjdk.java.net/jeps/180), improving the worst case scenario to O(log n).

Arjun Menonsays:This has to do with cache locality. Normally Build and Reinsertion should _not_ have different performance characteristics. In both cases, you are inserting elements. In Build you start with the empty set. In Reinsertion you start with a set with a few elements. Theoretically, if you assume that iterating/reading a set is fast, there would be no difference, since they are doing the same thing (“insert”).

The performance difference arises because reading from another set before each insertion, results in more cache misses. With such large data sets, the cache is being completed utilized, and reading from another region in memory before each write means more stuff be cached. Thus, more cache misses. That’s what going on here.

Daniel Lemiresays:In Build you start with the empty set. In Reinsertion you start with a set with a few elements.As pointed out in other comments, this is not actually relevant. I started the second phase with an element to avoid comments to the effect that I am copying the set in a stupid manner.

You can start with empty sets in both instances, it won’t change the result.

Theoretically, if you assume that iterating/reading a set is fast, there would be no difference, since they are doing the same thing (â€œinsertâ€).There is no theoretical result to says that insertion order is irrelevant.

The performance difference arises because reading from another set before each insertion, results in more cache misses. With such large data sets, the cache is being completed utilized, and reading from another region in memory before each write means more stuff be cached. Thus, more cache misses. That’s what going on here.If you look at the actual benchmarking code, on GitHub, you will see that both read their data from arrays. The only difference is the insertion order.