Ok. I think it is applicable, but it requires more work than I expected. I will need to come back to it.
Noah goldsteinsays:
What operation is it for P^2 / P^4. Seems to be neither power nor xor.
KWilletssays:
P^2 means pshufb(P,P); sorry for the rough notation. That’s the shuffle mask equivalent to applying P twice (shuffle masks are composable, subject to certain constraints such as out-of-range entries being replaced by 0).
Jeroen Koekkoeksays:
Only a minor remark, but the wire format for domain names is slightly different. Given lemire.me, the name contains three labels, one of length 6 (lemire), one of length 2 (me) and the root label. The correct wire format is 6lemire2me0 (in URLs the trailing dot is often implicit).
Additional background info: A resolver starts looking up information by first querying the . (root server), which responds with the name server for .me, which responds with the location for lemire.me..
Thanks. The blog post does not describe the final zero. However, the code does produce a trailing zero. In the case where the input is lemire.me, you get the following (see unit tests):
The non-simd function works in the test because it has a final *counter = 0; whereas the simplified code in the article excludes it and is wrong as a result.
I sympathize with excluding the label length validation and is_name_char implementation, both clearly for brevity’s sake, but the null terminator is a matter of correctness.
I (and presumably Jeroen too) would have found your introduction easier to understand since I think of the DNS name wire format the other way (dots are separators between labels, there’s a trailing empty label, labels are serialized with a length prefix byte). From your description, I figured you’d have to add the null terminator after the loop, so when I didn’t see it in your code, I mentally desk checked it to make sure I wasn’t misunderstanding.
Back when we were trying to validate UTF-8, I wracked my brain on how to get correct jump distances like this.
The trick is to exponentiate a permutation of the following form:
P[i] = i if position i has a dot
P[i] = i+1 if not
In SIMD terms that’s a constant vector plus the comparison mask:
P=
1 2 3 4 5 6 7 8
+-1 0 -1 0 0 0 0 -1
=
0 2 2 4 5 6 7 7
P^2 =
0 2 2 5 6 7 7 7 7
P^4 =
0 2 2 7 7 7 7 7
P has a fixed point at each dot, and the other locations fill from the right. The limit is an eigenvector under P.
Damn it. Let me try.
Ok. I think it is applicable, but it requires more work than I expected. I will need to come back to it.
What operation is it for P^2 / P^4. Seems to be neither power nor xor.
P^2 means pshufb(P,P); sorry for the rough notation. That’s the shuffle mask equivalent to applying P twice (shuffle masks are composable, subject to certain constraints such as out-of-range entries being replaced by 0).
Only a minor remark, but the wire format for domain names is slightly different. Given lemire.me, the name contains three labels, one of length 6 (lemire), one of length 2 (me) and the root label. The correct wire format is 6lemire2me0 (in URLs the trailing dot is often implicit).
Additional background info: A resolver starts looking up information by first querying the
.
(root server), which responds with the name server for.me
, which responds with the location forlemire.me.
.Thanks. The blog post does not describe the final zero. However, the code does produce a trailing zero. In the case where the input is lemire.me, you get the following (see unit tests):
The non-simd function works in the test because it has a final
*counter = 0;
whereas the simplified code in the article excludes it and is wrong as a result.I sympathize with excluding the label length validation and
is_name_char
implementation, both clearly for brevity’s sake, but the null terminator is a matter of correctness.I (and presumably Jeroen too) would have found your introduction easier to understand since I think of the DNS name wire format the other way (dots are separators between labels, there’s a trailing empty label, labels are serialized with a length prefix byte). From your description, I figured you’d have to add the null terminator after the loop, so when I didn’t see it in your code, I mentally desk checked it to make sure I wasn’t misunderstanding.