On the Internet, we often use 32-bit addresses which we serialize as strings such as 192.128.0.1. The string corresponds to the Integer address 0xc0800001 (3229614081 in decimal).
How might you serialize, go from the integer to the string, efficiently in C++?
The simplest code in modern C++ might…
You sometimes want to add a string to an existing data structure. For example, the C++17 template ‘std::optional’ may be used to represent a possible string value. You may copy it there, as this code would often do…
std::string mystring;
std::optional<std::string> myoption;
myoption =…
Originally, the domain part of a web address was all ASCII (so no accents, no emojis, no Chinese characters). This was extended a long time ago thanks to something called internationalized domain name (IDN).
Today, in theory, you can use any Unicode character you like as part of a domain name,…
For under $600, one can buy a 20-terabyte disk on Amazon. Unless you work professionally in multimedia, it is more storage than you need. However, having much storage it, by itself, of little use if you cannot access it. Thankfully, you can buy a 1-terabyte “disk” for $200 that provides over…
The year 2022 is over. As with every year that passes, we have made some scientific progress. I found the following achievements interesting:
Diluting the blood plasma of older human beings rejuvenate them.
In a nuclear fusion reactor, we have technically produced more energy than we put in.
You…
We often have to represent in software a value that might be missing. Different programming languages have abstraction for this purpose.
A recent version of C++ (C++17) introduces std::optional templates. It is kind of neat. You can write code that prints a string, or a warning if no string is…
Most systems today rely on Unicode strings. However, we have two popular Unicode formats: UTF-8 and UTF-16. We often need to convert from one format to the other. For example, you might have a database formatted with UTF-16, but you need to produce JSON documents using UTF-8. This conversion is…
Most domain names are encoded using ASCII (e.g., yahoo.com). However, you can register domain names with almost any character in them. For example, there is a web site at 💩.la called poopla. Yet the underlying infrastructure is basically pure ASCII. To make it work, the text of your domain is…
Suppose that I give you a set of reference strings (“ftp”, “file”, “http”, “https”, “ws”, “wss”). Given a new string, you want to quickly tell whether it is part of this set.
You might use a regular expression but it is unlikely to be fast in general:
const std::regex…
OpenAI made public a new tool called ChatGPT. It is widely regarding as a practical breakthrough in artificial intelligence. Given a question, it can produce a coherent essay-length answer. Last week, my employer held a meeting to discuss how it will impact college classes. OpenAI expects to make…
Given binary data, we often need to encode it as ASCII text. Email and much of the web effectively works in this manner.
A popular format for this purpose is base64. With Muła, we showed that we could achieve excellent speed using vector instructions on commodity processors (2018, 2020). However,…
storing 1 GiB/month on the cloud
0.02$US
web site of my twitter profile (@lemire), HTML alone
296 KiB
web site of my twitter profile (@lemire), all data
3.9 MiB
Google result for ‘Canada’, HTML alone
848 KiB
Google result for ‘Canada’, all data
3.7 MiB
Node JS runtime
164…
In C, the length of a string in marked by a 0 byte at the end of the string. Thus to determine the length of the string, one must scan it, looking for the 0 byte. Recent ARM processors have a powerful instruction set (SVE) that is well suited for such problems. It allows you to load large registers…
Suppose you would like to check that a string is not present in a large document. In C, you might do the following using the standard function strstr:
bool is_present = strstr(mydocument, needle);
It is simple and likely very fast. The implementation is algorithmically sophisticated.
Can you do…
In an earlier blog post, I reported that the memory usage of a small byte array in Java (e.g., an array containing 4 bytes) was about 24 bytes. In other words: allocating small blocks of memory has substantial overhead.
What happens in C++?
To find out, I can try to allocate one million 4-byte…
As we focus on some types of unfortunate discrimination (race, gender), we may become blind to other types of discrimination. For example, tend to discrimate against ugly people, short men, old people, and so on.
Life may have emerged on Earth thanks to ‘aqueous microdroplets’.
Naked mole rats…
Let us say that I ask you to find the number I am thinking about between -1000 and 1000, by repeatedly guessing a number. With each guess, I tell you whether your guess is correct, smaller or larger than my number. A binary search algorithm tries to find a value in an interval by repeating finding…
Modern processors have powerful vector instructions which allow you to load several values at once, and operate (in one instruction) on all these values. Similarly, they allow you to have vector constants. Thus if you wanted to add some integer (say 10001) to all integers in a large array, you…