hi, daniel, I think it was not caused by different compilers, but caused by different malloc implements, or same malloc in different cpu archs/OSs.
Silverbacksays:
Can you test this using zig ?
mesays:
Thank you. On glibc it is even more than I would have expected. Surprising to see that Apple (NetBSD?) fares better.
I guess with C++, a common improvement is to use Boost.Pool if you have many such tiny objects. https://www.boost.org/doc/libs/1_80_0/libs/pool/doc/html/boost_pool/pool/introduction.html
Given that Java stores (and exposes) the length of the array (i.e., Java arrays are more like a struct { int32 length; byte[] data }) it does fairly well in memory overhead. I would have thought that for a pure byte[4] allocation, C can do with 4-8 bytes.
Patrick Van Cauterensays:
It indeed depends on the implementation. I have an implementation (overwriting new and delete) that aligns on 8 bytes, so one million arrays of 4 bytes would only require 8 MiB (plus a small fraction for some housekeeping).
In theory it would be possible to write an implementation that requires even less (4 MiB), assuming that it’s ok to align 4 byte allocations on an address that’s a multiple of 4 bytes.
Wild Pointersays:
You’re not measuring just the array, you’re measuring the array + malloc overhead + cache alignment overhead + whatever implementation overhead. Pretty common knowledge going back as far as I can remember.
Yakovsays:
Daniel, you should not expect the idiomatic usage of C++, or C for that matter, to be terribly efficient.
I find the impl shifts away from the standard together with the requirements.
If your memory consumption and/or allocation latency hurts you, you quickly discover custom allocators, how to replace malloc with something thinner and faster, and some more things.
I’d also recommend reading “what every programmer should know about CPU and memory”
😀
Yakov
The point to be made here is there is cost in size and time to small allocations. Maybe you already know this, but more than a few of our peers are foggy on the topic.
Have you ever read code from others that contains:
1. Heap allocation that could be static?
2. Many small heap allocations in a much-repeated loop?
3. One-at-a-time heap allocation of a large number of objects of a single type?
To the performance-oriented folk – for the mental itch invoked by the above – you are welcome. 🙂
Keep in mind that the average programmer is just that. This sort of reminder is not out of place.
hi, daniel, I think it was not caused by different compilers, but caused by different malloc implements, or same malloc in different cpu archs/OSs.
Can you test this using zig ?
Thank you. On glibc it is even more than I would have expected. Surprising to see that Apple (NetBSD?) fares better.
I guess with C++, a common improvement is to use Boost.Pool if you have many such tiny objects.
https://www.boost.org/doc/libs/1_80_0/libs/pool/doc/html/boost_pool/pool/introduction.html
Given that Java stores (and exposes) the length of the array (i.e., Java arrays are more like a struct { int32 length; byte[] data }) it does fairly well in memory overhead. I would have thought that for a pure byte[4] allocation, C can do with 4-8 bytes.
It indeed depends on the implementation. I have an implementation (overwriting new and delete) that aligns on 8 bytes, so one million arrays of 4 bytes would only require 8 MiB (plus a small fraction for some housekeeping).
In theory it would be possible to write an implementation that requires even less (4 MiB), assuming that it’s ok to align 4 byte allocations on an address that’s a multiple of 4 bytes.
You’re not measuring just the array, you’re measuring the array + malloc overhead + cache alignment overhead + whatever implementation overhead. Pretty common knowledge going back as far as I can remember.
Daniel, you should not expect the idiomatic usage of C++, or C for that matter, to be terribly efficient.
I find the impl shifts away from the standard together with the requirements.
If your memory consumption and/or allocation latency hurts you, you quickly discover custom allocators, how to replace malloc with something thinner and faster, and some more things.
I’d also recommend reading “what every programmer should know about CPU and memory”
😀
Yakov
The point to be made here is there is cost in size and time to small allocations. Maybe you already know this, but more than a few of our peers are foggy on the topic.
Have you ever read code from others that contains:
1. Heap allocation that could be static?
2. Many small heap allocations in a much-repeated loop?
3. One-at-a-time heap allocation of a large number of objects of a single type?
To the performance-oriented folk – for the mental itch invoked by the above – you are welcome. 🙂
Keep in mind that the average programmer is just that. This sort of reminder is not out of place.