, 2 min read

# How expensive are the union and intersection of two unordered_set in C++?

If you are using a modern C++ (C++11 or better), you have access to set data structures (`unordered_set`) which have the characteristics of a hash set. The standard does not provide us with built-in functions to compute the union and the intersection of such sets, but we can make our own. For example, the union of two sets A and B can be computed as follow:

```
out.insert(A.begin(), A.end());
out.insert(B.begin(), B.end());
```

where `out`

is an initially empty set. Because insertion in a set has expected constant-time performance, the computational complexity of this operation is *O*(size(A) + size(B)) which is optimal.
If you are a computer scientist who does not care about real-world performance, your work is done and you are happy.
But what if you want to build fast software for the real world? How fast are these C++ sets?

I decided to populate two sets with one million integers each, and compute how how many cycles it takes to compute the intersection and the union, and then I divide by 2,000,000 to get the time spent per input element.

intersection (unordered_set) |
100 cycles/element |
---|---|

union (unordered_set) |
250 cycles/element |

How good or bad is this? Well, we can also take these integers and put them in sorted arrays. Then we can invoke the `set_intersection`

and `set_union`

methods that STL offers.

set_intersection (std::vector) |
3 cycles/element |
---|---|

set_union (std::vector) |
5 cycles/element |

That’s an order of magnitude better!

So while convenient, C++’s `unordered_set`

can also suffer from a significant performance overhead.

What about `std::set` which has the performance characteristics of a tree? Let us use code as follows where `out`

is an initially empty set.

```
std::set_union(A.begin(), A.end(), B.begin(), B.end(),
std::inserter(out,out.begin()));
```

set_intersection (std::set) |
150 cycles/element |
---|---|

set_union (std::set) |
750 cycles/element |

As we can see, results are considerably worse.

The lesson is that a simple data structure like that of `std::vector` can serve us well.