I did not realize that Intel had improved the gather performance in their latest processors. I have a few things I wanted to try speeding up with gather but since it wasn’t any faster than sequential loads in Haswell I’d shelved those ideas. The most straightforward one is a base64 decoder that uses a 65536 entry lookup table to lookup 8 groups of 2 bytes at a time and decode that into 12 bytes of output. Not sure if it’ll be faster than a conventional decoder but it’s probably worth testing.
Hi! Do you think this instruction be used in a search scenario, to gather matching docs “docvalues” (for scoring, or aggregating statistics).
Yes, it definitively can be used within a search engine.
Thanks!
I did not realize that Intel had improved the gather performance in their latest processors. I have a few things I wanted to try speeding up with gather but since it wasn’t any faster than sequential loads in Haswell I’d shelved those ideas. The most straightforward one is a base64 decoder that uses a 65536 entry lookup table to lookup 8 groups of 2 bytes at a time and decode that into 12 bytes of output. Not sure if it’ll be faster than a conventional decoder but it’s probably worth testing.