17th December 2009, 21 min read

Entropy-efficient Computing

27 thoughts on “Entropy-efficient Computing”

Daniel Lemire says:

December 17, 2009 at 10:36 am

@Mikhalev I think it is entirely possible to address this problem from first principles, independently of specific hardware. Please see the two references I have added to my post and in particular Frank 2002.

More recently, there seems to be claims that quantum computation is reversible, thus does not produce any entropy (heat).
Daniel Lemire says:

December 17, 2009 at 10:40 am

@JN Shannon’s entropy is related to Boltzman’s entropy. If you lower the entropy of an array in memory, then some other entropy must increase, thus heat must be generated.

If you raise the entropy of some array, then I expect there is no need (in principle) for much heat dissipation.
Daniel Lemire says:

December 17, 2009 at 11:17 am

@Leifer

Given a fixed-memory computer, I don’t see how computation can be reversible. Any reference to this demonstration by Charlie Bennett?

As for the sorted array versus the shuffled array, I am working from a probabilistic model. Suppose the orderings of an array were picked at random. What is the probability that it will be sorted?
Michael Nielsen says:

December 17, 2009 at 12:25 pm

C. H. Bennett, “Logical reversibility of computation”, IBM J. Res. Dev., vol 17, pp 525-532 (1973).

It’s a classic. The trade-off between memory and reversibility has subsequently been studied at some length; I’ve only skimmed that literature, though, and don’t know the best references.

Lecerf also proved results about reversibility 10 years or so before Bennett. The paper is in French, though, which means I’ve never read it. I wish a translation were available.
Daniel Lemire says:

December 17, 2009 at 1:02 pm

@Nielsen

Thanks for the reference.

Fair enough. Reversible computations can buy high entropic efficiency. However, we don’t work with such machines. My hard drive generates heat when I write to it.

So, my question still stand.
Kevembuangga says:

December 17, 2009 at 1:44 pm

shuffling an array takes time O(n2) whereas sorting it takes time O(n log n)

Depends…
For a given key size sorting an array is only O(n), the bucket sort!
This seemingly counterintuitive fact is because the sorting time is actually proportional to the total keys volume (key size x number of keys).
Usually the keys are way more larger than the minimum needed value 2**n bits so both statements are “true” sorting is O(n log n) in the worst case and O(n) in most practical cases. 🙂
Daniel Lemire says:

December 17, 2009 at 1:54 pm

@Kevembuangga True. Presumably, you can also shuffle an array quite fast if there are few distinct values.
Kevembuangga says:

December 17, 2009 at 2:05 pm

Oops! I made it upside down the minimum key size is log2(n) bits not 2**n.
BTW, a bucket sort analysis is here.
Philippe Beaudoin says:

December 17, 2009 at 2:22 pm

rst I wanted to mention reversible computing… But then it stroke me as something we, computer scientists, do very often: refer back to theoretical results.

While these are great to get a strong insight on a problem, limiting our vision to theoretical results can sometime stop us from investigating problems on which we can make progress and have significant impact on the world. Last year I listened to a talk by UBC’s Kevin Leyton-Brown on empirical algorithms. To quote from his webpage: “Despite the fact that algorithms are human artifacts, I believe that their theoretical analysis should be supplemented using the same approaches that are used to study natural phenomena.” (http://people.cs.ubc.ca/~kevinlb/research.html)

I encourage you to read the rest. He makes the point that very complex algorithms have been devised to solve computationally hard problems. For these algorithms, empirical experiments can often give us a better insight into which variations are better, in practice, for the kind of problems we’re facing.

The computers we use — together with their array of peripherals — are also complex and ever-changing beasts. A principled and empirical study of their energy efficiency under different algorithms and different design decisions strikes me as an interesting research project. Is quicksort significantly less energy efficient than mergesort? Is there a way to design software as to minimize energy consuption? Are there “best practices” we can put in place so that both execution time and energy efficiency can be taken into account in a design decision? After all, the goal is to minimize money… And money is not just about time anymore.
Philippe Beaudoin says:

December 17, 2009 at 2:23 pm

Missing from the beginning of my comment:

At first […]
Daniel Lemire says:

December 17, 2009 at 2:49 pm

Thank you Philippe.
Seb says:

December 17, 2009 at 3:08 pm

Such awesome folks in this comment thread! I envy you Daniel. Just sayin’.
Alex Mikhalev says:

December 17, 2009 at 10:20 am

There are several measures of the entropy, mostly known in CS is Shennon’s entropy. I have attempted to apply this kind entropy analysis to algorithms, but have not progress too far – basically it is possible to calculate entropy of the function output using histograms and dictionaries as in image processing. But how will you choose between 110 functions for same problem?
I am not sure you would like to calculate Boltzman’s entropy for the algorithm as algorithms can be implemented not on common hardware and thus thermodynamics entropy will be different.
But if I am going to minimize Boltzman’s entropy I would start from vectorisation and minimization of O number, then moving into hardware level optimization such as SSE2 instruction and so on.
JN says:

December 17, 2009 at 10:35 am

Daniel,

I’m not sure I follow your reasoning on shuffling without entropy cost. I assume that “entropy cost” would be defined as the generation of thermodynamic entropy, as that is the result of converting usable energy to unusable.

We must make a distinction between thermodynamic entropy and information entropy in this case. I would assume that any process carried by a computer that involves the conversion of electrical power to heat will generate entropy. Even if shuffling would be performed without energy use, the information entropy of the shuffled array is higher than the original.

JN
Matt Leifer says:

December 17, 2009 at 11:08 am

I don’t understand the distinction between shuffling and sorting. Didn’t Charlie Bennett show that any computation can be effected reversibly in principle? If this is correct, then the theoretical limit on entropy production for any computation is zero.

Regarding quantum computation, it is no more inherently reversible or irreversible than classical computation. Early on in QC research there were claims that QC *had* to be reversible due to the unitarity of the dynamics, unlike classical computing, which could be done either reversibly or irreversibly. However, this ignored the possibility of using measurements within the computation. There are both reversible and irreversible models of quantum computation, just as there are in the classical case. For example, the cluster state model is definitely irreversible, whereas the circuit model is reversible. The claims you are referring to are probably being made in the context of the quantum circuit model, since this is the most common model that is used. However, in this case, it should be absolutely no surprise that it is reversible because it is a simple generalization of the *classical* reversible circuit model.
Matt Leifer says:

December 17, 2009 at 1:09 pm

OK, that makes sense now. I didn’t get the fixed memory bit before.

Does anyone know to what extent fixed memory quantum computing has been studied? I mean, I know of a few results about memory and time tradeoffs for particular algorithms, but they all assume that memory scales with input size. Can we do any quantum computation with constant memory? Given that it impacts reversibility, it seems like you might need to use some clever tricks to get it to work.
Daniel Lemire says:

December 18, 2009 at 7:59 am

@Kujala

(1) Suppose an algorithm runs faster, but keeps writing to disk large data structures, should you prefer it to an algorithm that avoids writing to disk as much as possible, but runs slower? (For example, the second algorithm might be using compression.)

(2) If you can use 4 CPUs to cut the running time in half, is that a good deal entropy-wise?
J Kujala says:

December 18, 2009 at 3:07 am

If we want to minimize heat generation, then just do as little work as possible. So it goes reduces to time complexity in theory?

In practise those crude tricks of improving hardware are likely to be best. Also, teaching programmers to use tricks such as doing work in bursts so that hardware is not on all the time.
Alex Mikhalev says:

December 18, 2009 at 5:03 am

Daniel thank you very much for this discussion. The connection between Boltzman entropy and Shannon entropy is fairly interesting and important for me personally and is interesting scientific project. I will check references and will try to made it practical, as I need the new scientific ways to choose between several algorithms for one problem.
Daniel Lemire says:

December 18, 2009 at 11:22 am

@Itman

That’s true. Shuffling an array can be done in linear time. I fixed my post (after 2000 people saw my mistake… there is a lesson somewhere…)

No sure what I was thinking about…
Daniel Lemire says:

December 18, 2009 at 11:48 am

@Itman Because the shuffled array has less or as much entropy as the initial array (Boltzman’s entropy ). It does not take work to shuffle the molecules of a gas, for example. It takes work to organize these molecules into a crystal though.
Daniel Lemire says:

December 18, 2009 at 1:40 pm

@Cyril

Given that I cannot get away by saying that sorting is in O(n log n), I would say that you are right to have faith in this blog. There is intense peer review!

I think that the Safari bug went away with the more recent versions of the browser.
Itman says:

December 18, 2009 at 11:17 am

Daniel,
If I understand it correctly, a random shuffling takes only n swaps and n calls to rand() function. How can it be O(n^2)?
Itman says:

December 18, 2009 at 11:36 am

@Daniel. No worries. We all make mistakes.
PS: I also wonder why would you expect energy-less shuffling?
Cyril says:

December 18, 2009 at 12:39 pm

To add a bit of pedantry to this otherwise interesting post, the lower bound on sorting is n.log(n) if you only use comparisons to sort.

Daniel, I have such faith in your knowledge and command of efficient algorithms that when I saw you claimed shuffling an array was O(n^2) I immediately assumed you meant a 2-dimensional, n x n array…

PS: the spam protection bug on Safari is annoying.
beza1e1 says:

December 21, 2009 at 5:57 pm

You note that parallel programming can be used to do the same amount of work in the same time with less energy for embarrassingly parallel tasks. Maybe this isn’t restricted to embarrassingly parallel stuff like serving web pages though. Maybe parallelism can be used to reduce energy consumption.
melvin gldstein says:

September 29, 2011 at 12:59 pm

Entropy is one of Physics Foibles. Murphys Law is the layman’s Entropy.