Daniel Lemire's blog

☰

Daniel Lemire is a computer science professor at the Data Science Laboratory of the Université du Québec (TÉLUQ) in Montreal. His research is focused on software performance and data engineering. He is a techno-optimist and a free-speech advocate.

18th March 2019, 2 min read

Don’t read your data from a straw

3 thoughts on “Don’t read your data from a straw”

Me says:

March 19, 2019 at 6:47 am

One of the things that can hurt Java here is the classic big endian vs. little endian problem…
In a lot of cases, Java is prepared to swap endianess to be compatible across different CPU architectures. Something where in C you usually have to manually insert htonl and ntohl calls etc. – is all your code endianess safe?
1. Daniel Lemire says:
  
  March 19, 2019 at 7:50 pm
  
  Thanks for raising this point.
  
  In the case above, we do Java-vs-Java comparisons so endianness is not an issue.
  
  In both Java and C/C++, you sometimes need to flip the bytes around. In C/C++, you have to check whether you have a big endian or little endian system, whereas with Java, it is always big endian. Yet, in my own experience, I have been able to safely assume that all systems I care about are little endian. So I have designed binary formats that are explicitly little endian.
  
  This being said, the computational burden of reversing byte order is tiny.
Roman Leventov says:

March 23, 2019 at 11:08 pm

ByteBuffer has very unfortunate API. Pretty much all systems/high-performance projects in Java (Netty, Aeron, Chronicle, etc) reimplement it on their own.

We’ve built Memory project (link) specifically to back up data structure implementations in Java. It is used in DataSketches (link).