10th August 2023, 6 min read

Coding of domain names to wire format at gigabytes per second

8 thoughts on “Coding of domain names to wire format at gigabytes per second”

KWillets says:

August 10, 2023 at 6:21 pm

Back when we were trying to validate UTF-8, I wracked my brain on how to get correct jump distances like this.

The trick is to exponentiate a permutation of the following form:

P[i] = i if position i has a dot

P[i] = i+1 if not

In SIMD terms that’s a constant vector plus the comparison mask:

P=
1 2 3 4 5 6 7 8 + -1 0 -1 0 0 0 0 -1
= 0 2 2 4 5 6 7 7

P^2 = 0 2 2 5 6 7 7 7 7

P^4 = 0 2 2 7 7 7 7 7

P has a fixed point at each dot, and the other locations fill from the right. The limit is an eigenvector under P.
1. Daniel Lemire says:
  
  August 10, 2023 at 7:59 pm
  
  Damn it. Let me try.
  1. Daniel Lemire says:
    
    August 10, 2023 at 9:04 pm
    
    Ok. I think it is applicable, but it requires more work than I expected. I will need to come back to it.
2. Noah goldstein says:
  
  August 13, 2023 at 2:17 am
  
  What operation is it for P^2 / P^4. Seems to be neither power nor xor.
  1. KWillets says:
    
    August 14, 2023 at 6:03 pm
    
    P^2 means pshufb(P,P); sorry for the rough notation. That’s the shuffle mask equivalent to applying P twice (shuffle masks are composable, subject to certain constraints such as out-of-range entries being replaced by 0).
Jeroen Koekkoek says:

August 28, 2023 at 7:15 am

Only a minor remark, but the wire format for domain names is slightly different. Given lemire.me, the name contains three labels, one of length 6 (lemire), one of length 2 (me) and the root label. The correct wire format is 6lemire2me0 (in URLs the trailing dot is often implicit).

Additional background info: A resolver starts looking up information by first querying the . (root server), which responds with the name server for .me, which responds with the location for lemire.me..
1. Daniel Lemire says:
  
  August 28, 2023 at 1:02 pm
  Thanks. The blog post does not describe the final zero. However, the code does produce a trailing zero. In the case where the input is lemire.me, you get the following (see unit tests):
```
input:
6c 65 6d 69 72 65 2e 6d 65
name_to_dnswire_simd output:
06 6c 65 6d 69 72 65 02 6d 65 00
name_to_dnswire output:
06 6c 65 6d 69 72 65 02 6d 65 00
name_to_dnswire_scalar_labels output:
06 6c 65 6d 69 72 65 02 6d 65 00
name_to_dnswire_avx output:
06 6c 65 6d 69 72 65 02 6d 65 00
name_to_dnswire_idx_avx output:
06 6c 65 6d 69 72 65 02 6d 65 00
```
  1. Isaac Kabel says:
    
    August 29, 2023 at 2:57 am
    
    The non-simd function works in the test because it has a final *counter = 0; whereas the simplified code in the article excludes it and is wrong as a result.
    
    I sympathize with excluding the label length validation and is_name_char implementation, both clearly for brevity’s sake, but the null terminator is a matter of correctness.
    
    I (and presumably Jeroen too) would have found your introduction easier to understand since I think of the DNS name wire format the other way (dots are separators between labels, there’s a trailing empty label, labels are serialized with a length prefix byte). From your description, I figured you’d have to add the null terminator after the loop, so when I didn’t see it in your code, I mentally desk checked it to make sure I wasn’t misunderstanding.