5th October 2016, 14 min read

Variable-length strings can be expensive

20 thoughts on “Variable-length strings can be expensive”

David Tweed says:

October 5, 2016 at 4:46 pm

Out of curiosity, which compiler & version did you use for the C++? (The small string optimization in recent C++ makes them very close to fixed length strings for small strings, at the cost of a few more instructions to decide which version is being used.)
1. Daniel Lemire says:
  
  October 5, 2016 at 5:44 pm
  Out of curiosity, which compiler & version did you use for the C++?
```
$ g++ --version
g++ (Ubuntu 5.4.1-2ubuntu1~16.04) 5.4.1 20160904
```
  The small string optimization in recent C++ makes them very close to fixed length strings for small strings (…)
  
  My results are much the same under clang. With the Intel compiler, fixed-length strings are no longer any faster, but that’s because they get to be as slow as the variable-length ones.
  
  I don’t have easy access to old compilers on this exact machine.
  1. davetweed says:
    
    October 5, 2016 at 6:45 pm
    
    I’ve had a look at your code and I wouldn’t be surprised if the biggest contribution to the speed difference is that the fixed length 8 padded-strings are allowing you to use a very optimized comparison function. (I don’t have a profiler setup on my new machine yet.)
    This is an interesting result and I agree fixed length strings are interesting, I’m just interested in figuring out what precisely is happening on the machine.
  2. Maxim Egorushkin says:
    
    October 6, 2016 at 9:48 am
    
    Still not clear whether you benchmarked the C++11 std::string or C++98 std::string? http://developers.redhat.com/blog/2015/02/05/gcc5-and-the-c11-abi/
    
    clang can use the GNU C++ standard library, so the next question is which C++ standard library did you benchmark with clang?
    1. Daniel Lemire says:
      
      October 6, 2016 at 2:27 pm
      
      The command line I used to compile my small program is in the first line of the source code:
      
      https://github.com/lemire/Code-used-on-Daniel-Lemire-s-blog/blob/master/2016/10/05/pointersort.cpp
      
      This is a fresh Ubuntu install (see previous comment for version number).
      
      I encourage you to run your own experiments.
      1. Gaurav says:
        
        October 6, 2016 at 6:03 pm
        
        Memory alignment could be playing a role over here.
        
        A useful test might be to allocate memory for std::string using aligned_alloc (alignment = 8).
        
        Daniel Lemire says:
        
        October 6, 2016 at 6:59 pm
        
        I’d be interested in any number you have but I am skeptical regarding memory alignment in this case. I doubt memory access is the bottleneck.
      2. Maxim Egorushkin says:
        
        October 11, 2016 at 11:25 am
        
        I do not have Ububtu, but I still would like to know which std::string was used in your benchmarks.
Dominic Amann says:

October 5, 2016 at 5:00 pm

I suspect that a real application spends very little overall time sorting strings, so optimizing that part will have little impact on the overall performance.

I recently profiled some data acquisition middleware, and discovered that it spent 93% of it’s time waiting on the driver to deliver raw data. Optimizing the remaining 7% would be largely a waste of my time.

When looking closely at the driver – I further uncovered that I was operating at about 70% of the underlying transport mechanisms’ capacity – and improving this would require better firmware on the device side (reducing gaps between bytes and strings).

So – even though I was adding arrays of integer numbers together, and I could see ways to optimize these, it would make no appreciable difference overall.
1. Daniel Lemire says:
  
  October 5, 2016 at 5:53 pm
  
  @Dominic
  
  You’ll get no argument from me.
  
  Many systems have their performance bound set by IO, network… or other architectural constraints. That’s why we build massive systems using JavaScript (which does not even support parallelism!).
Abhimanyu Rawat says:

October 5, 2016 at 5:08 pm

As you said it’s application dependent, in MySQL where a fixed length is required the performance should be better. And going further in C padded where the string fits in word length only, there is some architecture dependency as well right, for tuning it to avoid pointers ?
1. Daniel Lemire says:
  
  October 5, 2016 at 6:10 pm
  
  in MySQL where a fixed length is required the performance should be better
  
  I expect so but I’d be interested in seeing numbers.
Leonid Boytsov says:

October 5, 2016 at 5:39 pm

GCC strings are notoriously bad, because they are copy-on-write. They even used to have atomic operations involved in simple constructors, not to mention that GNU couldn’t implement a bug-free string implementation for a multi-core environment for many years. Copy-on-read strings are probably better (unfortunately, STLPORT died).

That said, even with copy-on-read strings, accessing the string requires a dereference operation, which slows things down. Try a variant of the string, which keeps the counter followed by the data buffer. This one could be fast, IMHO.
1. davetweed says:
  
  October 5, 2016 at 6:22 pm
  
  I think that used to be the case prior to the changes for C++11. (Not that g++’s stuff is brilliant, but it does use the SSO now.)
  1. Leonid Boytsov says:
    
    October 5, 2016 at 7:20 pm
    
    Good to know, thank you! Is 1024 char a short string?
    1. Daniel Lemire says:
      
      October 5, 2016 at 7:30 pm
      
      To clarify: in the test, my strings are short (much less than 8 bytes), but I sort 1024 of them.
Leonid Boytsov says:

October 5, 2016 at 5:41 pm

BTW, Java’s string are immutable for a good reason. In a multi-threaded environment, it’s often much cheaper to copy a small string than to block a gazillion of threads on trivial string operations.
1. Daniel Lemire says:
  
  October 5, 2016 at 6:05 pm
  
  That’s a good point.
  
  Rust and Swift offer both mutable and immutable strings.
KWillets says:

October 5, 2016 at 7:50 pm

We’ve unfortunately hit the point where most dictionary words will fit into a single machine word; the classic benchmark of sorting the dictionary turns out slower using classic algorithms than sorting it fixed-length.

Some current algorithms use the fixed-length method by caching 4-8 bytes in an int next to the string pointer. For instance here: http://rd.springer.com/chapter/10.1007%2F978-3-540-89097-3_3#page-1 . It is indeed faster.
Jason Schulz says:

October 5, 2016 at 11:32 pm

It seems like adding length to the type in certain cases, similar to std::array, ultimately gives the compiler/optimizer more information to optimize with.