Daniel Lemire's blog

String representations are not unique: learn to normalize!

, 2 min read

Most strings in software today are represented using the unicode standard. The unicode standard can represent most human readable strings. Unicode works by representing each ‘character’ as a numerical value (called a code point) between 0 and 1 114 112. Thus the character é is typically…

Converting integers to decimal strings faster with AVX-512

, 4 min read

In most systems, integers are stored using a fixed binary representation. It is common to store integers using 32-bit or 64-bit words. You sometimes need to convert it to a string. For example, the integer 12345 might need to be converted to the five characters ‘12345’. In an earlier blog post,…

Writing out large arrays in Go: binary.Write is inefficient for large arrays

, 2 min read

Programmers often need to write data structures to disk or to networks. The data structure then needs to be interpreted as a sequence of bytes. Regarding integer values, most computer systems adopt “little endian” encoding whereas an 8-byte integer is written out using the least significant…

Enforcement by software

, 3 min read

At my university, one of our internal software systems allows a professor to submit a revision to a course. The professor might change the content or the objectives of the course. In a university, professors have extensive freedom regarding course content. As long as you reasonably meet the course…

The Canadian Common CV and the captured academy

, 3 min read

Most Canadian academics have to write their resumes using a government online tool called the Common CV. When it was first introduced, it was described as a time-saving tool: instead of writing your resume multiple times for different grant agencies, you would write it just once and be done with…

How many digits in a product?

, 2 min read

We often represent integers with digits. E.g., the integer 1234 has 4 digits. By extension, we use ‘binary’ digits, called bits, within computers. Thus the integer 7 requires three bits: 0b111. If I have two integers that use 3 digits, say, how many digits will their product…

The end of the monopolistic web?

, 2 min read

Except maybe in totalitarian states, you cannot have a single publisher. Most large cities had multiple independent newspapers. In recent years, we saw a surge of concentration in newspaper and television ownership. However, this was accompanied by a surge of online journalism. The total number of…

SWAR explained: parsing eight digits

, 4 min read

It is common to want to parse long strings of digits into integer values. Because it is a common task, we want to optimize it as much as possible. In the blog post, Quickly parsing eight digits, I presented a very quick way to parse eight ASCII characters representing an integers (e.g., 12345678)…

What is the `range´ of a number type?

, 1 min read

In programming, we often represent numbers using types that have specific ranges. For example, 64-bit signed integer types can represent all integers between -9223372036854775808 and 9223372036854775807, inclusively. All integers inside this range are valid, all integers outside are “out of…

How programmers make sure that their software is correct

, 16 min read

Our most important goal in writing software is that it be correct. The software must do what the programmer wants it to do. It must meet the needs of the user. In the business world, double-entry bookkeeping is the idea that transactions are recorded in at least two accounts (debit and credit). One…

Science and Technology links (December 19th 2021)

, 1 min read

Becoming a physician increases the use of antidepressants, opioids, anxiolytics, and sedatives, especially for female physicians. When trying to reproduce results in cancer researchers, independent researchers find that the benefits are typically grossly exaggerated. A large planet has been found…

Science and Technology links (December 4th 2021)

, 2 min read

It used to be that all the exciting new processors came from Intel and AMD, and they were meant for your PC. The mobile revolution changed that: it lead to the production of fantastic processors that used little energy. We are now moving back into laptops and servers. The leading supplier of…

Can you safely parse a double when you need a float?

, 2 min read

In C as well as many other programming languages, we have 32-bit and 64-bit floating-point numbers. They are often referred to as float and double. Most of systems today follow the IEEE 754 standard which means that you can get consistent results across programming languages and operating systems.…

Science and Technology links (Novembre 28th 2021)

, 1 min read

Government-funded research is getting more political and less diverse: The frequency of documents containing highly politicized terms has been increasing consistently over the last three decades. The most politicized field is Education & Human Resources. The least are Mathematical &…

Are tenured professors more likely to speak freely?

, 3 min read

University professors often have robust job security after a time: they receive tenure. It means that they usually do not have to worry about applying for a new job after a few years. Tenure is not available in all countries. Countries like Australia reassess positions every few years. So why does…

Converting integers to fix-digit representations quickly

, 5 min read

It is tricky to convert integers into strings because the number of characters can vary according to the amplitude of the integer. The integer ‘1’ requires a single character whereas the integer ‘100’ requires three characters. So a solution might possible need a hard-to-predict branch. Let…

Science and Technology links (Novembre 13rd 2021)

, 1 min read

Pacific rougheye rockfish can live hundreds of years while other rockfish barely live past ten years. Female condors can reproduce without males. The phenomenon is known as parthenogenesis and it occurs in birds such as chickens. It does not happen in mammals naturally as far as we…

Checking simple equations or inequalities with z3

, 3 min read

When programming, you sometimes need to make sure that a given formula is correct. Of course, you can rely on your mastery of high-school mathematics, but human beings, in general, are terrible at formal mathematics. Thankfully, you can outsource simple problems to a software library. If you are a…

Stop spending so much time being trolled by billionaire corporations!

, 5 min read

As a kid, my parents would open the television set, and we would get to watch whatever the state television decided we would watch. It was a push model. Some experts pick the content you need and they deliver it to you. You have little say in the matter. There was one newspaper in my town. We had…