You can have more than one thread per core to make better use of your memory ressources but you are going to have to pay the overhead of threads…
Oren Tiroshsays:
Using cores or hyperthreads can be used to learn the limits to know whether further optimization is theoretically possible – and predict whether this technique might be useful in boosting the overall performance of a parallel application or just squeeze out single-core benchmarks and unique scenarios.
Do you really mean page faults or TLB misses? You might also validate this by comparing the TLB reach of the processor to the data size.
Do you really mean page faults or TLB misses?
I meant TLB misses, thanks. I tried to avoid to introduce the concept of TLB in the post, hence the confusion.
Isn’t the limit due to memory level parallelism, you wrote about earlier?
https://lemire.me/blog/2018/11/13/memory-level-parallelism-intel-skylake-versus-apple-a12-a12x/
But we are still a way off from the level of parallelism we should have.
What about core or hyperthread level parallelism? Does it share the same resources and limits? Some of them?
You can have more than one thread per core to make better use of your memory ressources but you are going to have to pay the overhead of threads…
Using cores or hyperthreads can be used to learn the limits to know whether further optimization is theoretically possible – and predict whether this technique might be useful in boosting the overall performance of a parallel application or just squeeze out single-core benchmarks and unique scenarios.