10th March 2023, 9 min read

Trimming spaces from strings faster with SVE on an Amazon Graviton 3 processor

13 thoughts on “Trimming spaces from strings faster with SVE on an Amazon Graviton 3 processor”

Bill says:

March 13, 2023 at 2:28 pm

I wonder if it would speed up Unicode whitespace removal or could be twisted into removing all whitespace type of characters (VT, HT, NL, CR, …)?
1. Daniel Lemire says:
  
  March 13, 2023 at 2:59 pm
  
  It can be generalized, yes. Note that the functions described in the blog post will work correctly with UTF-8 inputs.
Evan says:

March 13, 2023 at 7:11 pm

The documentation I have seen shows that SVE does not need a drain operation, it can instead be predicated to do a partial operation for the remainder elements.

See slides 32 and 33 of this SVE presentation for an example of what I mean.

Is there a specific reason why you coded it do work in two steps? I could see this being a naive translation from SIMD (NEON) style to SVE, but I don’t think it is leveraging the full benefits of SVE predication.
1. Daniel Lemire says:
  
  March 13, 2023 at 7:56 pm
  
  See slides 32 and 33 of this SVE presentation for an example of what I mean.
  
  If you write it in this manner, you will have an additional intrinsic function per loop (and the accompanying instruction) compared to my solution (focus on the main loop).
  
  Is there a specific reason why you coded it do work in two steps?
  
  For better performance. I have added the single-loop function to my benchmark, you can test it out and verify that it is slower.
  1. Evan says:
    
    March 13, 2023 at 8:13 pm
    
    Thanks for the response, I suspected that may be the reason why.
    
    What if you use the optimization given on slide 34 to elide the i < len comparison? Does it catch back up? Sorry I do not have access right now to test it myself 🙂
    1. Daniel Lemire says:
      
      March 13, 2023 at 9:01 pm
      I have added the one loop alternative you pointed to. The performance is slightly improved but still very much inferior to the two-loop approach.
      
      I should point out that my expectation is that my implementation with a loop and a trail is probably too simple and could be optimized further (at the cost of greater code complexity).
      
      scalar cycles/bytes 1.75
      sve one loop cycles/bytes 0.70
      
      sve one loop alt cycles/bytes 0.66
      
      sve cycles/bytes 0.47
      
      So, unfortunately, the pretty code that ARM is displaying on their slides is likely not optimal.
      
      Of course, this could change with future compilers and/or processors. For example, the compiler could do the unrolling for you.
  2. Evan says:
    
    March 13, 2023 at 8:17 pm
    
    Also sorry for saying it was a somehow bad or naive translation from neon, clearly not the case.
    
    Enjoyed the post, thanks for sharing.
    1. Daniel Lemire says:
      
      March 13, 2023 at 9:04 pm
      
      Also sorry for saying it was a somehow bad or naive translation from neon, clearly not the case.
      
      You might be interested in going back in time to how I started with SVE:
      https://lemire.me/blog/2022/06/23/filtering-numbers-quickly-with-sve-on-amazon-graviton-3-processors/
      
      You might notice that my early code matched the ARM slides more than my recent code.
      
      So, in some sense, what looked to you like “naive” code is, from my point of view, more sophisticated (optimized) code…i.e., it will run faster.
Evan says:

March 13, 2023 at 7:13 pm

Also why does the second operation use svcmpeq while the loop uses svcmpne?
1. Daniel Lemire says:
  
  March 13, 2023 at 7:57 pm
  
  A typographical error. Thanks for pointing it out.
-.- says:

March 14, 2023 at 10:34 am

Do you have any speed comparisons against NEON versions of the earlier despacer?
1. Daniel Lemire says:
  
  March 14, 2023 at 3:58 pm
  
  My source code online includes the NEON version. The performance is comparable. However, the NEON version requires a large table.
  1. -.- says:
    
    March 15, 2023 at 1:18 pm
    
    Thanks for the info!