It would be more interesting if you could please provide some specific examples and how you narrowed it down to a single dependency and/or a specific function. Were you using a profiler to determine the bottleneck? Can the techniques be automated? For example, is there a tool that can take “too slow” code and make it “really fast”?
Great point. And one should take into account that a supercomputer nowadays is usually just a bunch of loosely connected GPUs or CPUs residing in different boxes. Not only I would disagree with your claim that ||-ion is easy, I would say it is super hard. For example, by taking a framework like MapReduce and running your || code can easily take longer or equivalent amount of time. The problem is that once you have loosely connected computing units, communication becomes an annoying bottleneck.
PS: regarding the joys of ||-zation, there is a good joke: knock-knock, race condition, who’s there?
Vishal Belsaresays:
“Some software libraries are clever and do this work for you… but if you wrote your code without care for performance, it is likely you did not select these clever libraries.”
Daniel, can you please mention the clever libraries you referred to? Would be helpful to know.
I am not sure it is of general interest, but here are some examples.
Under R, the ‘boot’ package makes it really easy to parallelize the processing, one just needs to add a flag.
In Python, some numpy automatically parallelize (e.g., numpy.dot).
And so forth.
Yaakovsays:
At one place I worked we had a data-crunching program that we thought ran reasonably well and took 45 minutes. In the process of adding a feature, a coworker cleaned up the code (mostly rearranging do loops and the like). Afterwards the program ran in 3 minutes — surprised everyone, including the coworker.
Usage of modern languages like Go and Rust, where memory management and concurrency is built-in is the best way to go for new projects. Also note that parallelism is not the only way for concurrency. Simple goroutines can bring about a lot of performance gain.
Ivey raises pre-flop through the button to 150,000 and
gets re-raised to 460,000. So if you’re dealt so named “suited connectors” as the hole cards (two cards of the same
suit close to the other, say a nine and ten of hearts) and
hit three more hearts you’re in a good position of strength.
Since it is a computerized game, and lacks real human intervention within the shuffling and
dealing, they need to use a software package for the job
of your poker dealer.
It would be more interesting if you could please provide some specific examples and how you narrowed it down to a single dependency and/or a specific function. Were you using a profiler to determine the bottleneck? Can the techniques be automated? For example, is there a tool that can take “too slow” code and make it “really fast”?
I have updated my blog post with more concrete recommendations and examples.
Thank you very much. That is helpful.
Great point. And one should take into account that a supercomputer nowadays is usually just a bunch of loosely connected GPUs or CPUs residing in different boxes. Not only I would disagree with your claim that ||-ion is easy, I would say it is super hard. For example, by taking a framework like MapReduce and running your || code can easily take longer or equivalent amount of time. The problem is that once you have loosely connected computing units, communication becomes an annoying bottleneck.
Parallel processing is sometimes easy because other people made it easy for us.
PS: regarding the joys of ||-zation, there is a good joke: knock-knock, race condition, who’s there?
“Some software libraries are clever and do this work for you… but if you wrote your code without care for performance, it is likely you did not select these clever libraries.”
Daniel, can you please mention the clever libraries you referred to? Would be helpful to know.
I am not sure it is of general interest, but here are some examples.
Under R, the ‘boot’ package makes it really easy to parallelize the processing, one just needs to add a flag.
In Python, some numpy automatically parallelize (e.g., numpy.dot).
And so forth.
At one place I worked we had a data-crunching program that we thought ran reasonably well and took 45 minutes. In the process of adding a feature, a coworker cleaned up the code (mostly rearranging do loops and the like). Afterwards the program ran in 3 minutes — surprised everyone, including the coworker.
Same is true for Spark. https://lnkd.in/fCsrKXj
Usage of modern languages like Go and Rust, where memory management and concurrency is built-in is the best way to go for new projects. Also note that parallelism is not the only way for concurrency. Simple goroutines can bring about a lot of performance gain.
Ivey raises pre-flop through the button to 150,000 and
gets re-raised to 460,000. So if you’re dealt so named “suited connectors” as the hole cards (two cards of the same
suit close to the other, say a nine and ten of hearts) and
hit three more hearts you’re in a good position of strength.
Since it is a computerized game, and lacks real human intervention within the shuffling and
dealing, they need to use a software package for the job
of your poker dealer.