5th September 2019, 8 min read

Passing integers by reference can be expensive…

9 thoughts on “Passing integers by reference can be expensive…”

Travis Downs says:

September 5, 2019 at 9:32 pm

This effect is due to possible aliasing. The compiler can’t be sure the array and the passed in value don’t overlap (i.e., that the referenced value isn’t one of the array elements). In the case of aliasing, the semantics of the two loops are not the same, since the value i will be modified by the assignment (once) in addition to the increment (maybe it’s even UB due to double-modifcation w/o sequence point). So the compiler emits this conservative code which reloads the value each time. In addition to slower code due to the in-memory increment, optimizations like vectorization are inhibited.
1. Travis Downs says:
  
  September 5, 2019 at 10:07 pm
  
  Possible fixes:
  
  1) Copy i into a local before the loop. This way the compiler knows aliasing isn’t possible since i is an unescaped local. You’ll have to copy i back at the end to maintain the semantics of incrementing k times.
  
  2) Put the __restrict qualifier on eitheri or x, or both. Roughly speaking, this tells the compiler the values don’t overlap. It isn’t standard in C++, but is relatively widely supported, perhaps because it is standard in C.
  
  3) Use different types. If the type pointed to byx is different than the type of i, most compilers use type based aliasing analysis (TBAA) to prove aliasing doesn’t occur.
  
  The compiler could also save you, e.g., by checking whether aliasing occurs at runtime, eg checking whether the address of i falls in the range x – x + k. I’m fact, both clang and gcc do this for slightly simpler cases, where i is not incremented: if the runtime check passes, the fast loop is used, otherwise it falls back to the conservative one.
  1. Daniel Lemire says:
    
    September 5, 2019 at 10:30 pm
    Use different types.
    
    In my opinion, this is trickier than it sounds. Naively, I would think that ‘counter’ in the example below is of a type distinct from ‘uint64_t’… but I see the same bad optimization effect, possibly because the standard does not allow the compiler to forbid aliasing of this sort. And this says nothing about real C++ code where it is not always immediately obvious what the actual type is (due to the tendency of C++ programmers to abstract away types).
```
struct counter {
    uint64_t val;
    uint32_t garbage;
};
void incrementj(counter & i, uint64_t * x){
for(int k = 0; k < 1000000; k++) {
x[k] += i.val++;
}
}
```
me says:

September 6, 2019 at 6:11 am

Does using a local copy of i, and copying the final value back to the reference make a difference? So uint64_t j = i; for (; j < size; j++) ... i = j;?
1. Daniel Lemire says:
  
  September 6, 2019 at 1:00 pm
  
  Right: creating a local copy helps the compiler.
Michael says:

September 6, 2019 at 12:11 pm

You mention that Swift is likely affected by this, but it’s not. Swift enforces at a language level that inout parameters don’t alias. Usually at compile time, but with runtime precondition checks when it can’t prove it statically.

Swift code currently has other overhead that makes this function bigger than it’s C equivalent but those are separate issues. Largely due to the fact that an array in Swift is not just a pointer to data, but is in fact a copy-on-write ref-counted type that knows its own size and capacity and can change its size to fit its elements.
1. Daniel Lemire says:
  
  September 6, 2019 at 12:45 pm
  I make no claim regarding aliasing. I make a claim regarding performance.
  
  My claim is that this is going to be slower…
```
func fx(_ allints: inout [Int],  _ j : inout Int) {
      for k in allints.indices {
        allints[k] = j
        j &+= 1
      }
 }
```
  than the following…
```
 func f(_ allints: inout [Int],  _ i : inout Int) {
      var j = i
      for k in allints.indices {
        allints[k] = j
        j &+= 1
      }
      i = j
 }
```
  Now, admittedly, there is no fundamental reason for it when the array is large. Even in C++, the compiler could add a check for overlap at the start (knowing that the loop is quite long) and optimize it away. My point in this post is that current compilers (at least some popular ones) will let you down.
  
  I took a few seconds to write a Swift version of the benchmark…
```
$ swift -O reference.swift
ref 1.892044 ns
hard 0.554002 ns
```
  Basically, Swift exhibits exactly the same performance characteristics as C++ with GNU GCC 8. In fact, if you convert the nanoseconds into CPU cycles, you get very nearly the same performance down to a fraction of a cycle.
  
  I do not doubt that Swift “doesn’t alias”, as you explain… but it does not follow that the two functions have the same performance once compiled (with optimizations).
  1. Michael says:
    
    September 7, 2019 at 2:26 pm
    
    That’s quite bizarre, as inout variables are defined to be semantically exactly the same as copying the value to a local variable and copying the new value back to the original variable at the end of the function definition, so according to the rules of the language these functions should generate the exact same code.
    
    Do you mind if I include your benchmark in a bug report on the Swift compiler?
    1. Daniel Lemire says:
      
      September 7, 2019 at 2:36 pm
      
      Please do. Feel free to refer to my blog post as well. 🙂