Hi Chris,
I just came across this post and I quite like it. However, this line
confuses me:
> In typical C implementations, the structure fields would be passed> practically, if not exactly, same way as the individual parameters> would have been passed, so it’s really no less efficient.
I think that ABI function calling conventions control this, so the C
implementation cannot do much about it. For example, on x86-64, SysV
requires that only parameters smaller than 16 bytes would be passed in
this way [0], and Windows `__fastcall` convention only allows 8 bytes.
So I think functions that take fat pointers should be less efficient
in general, or did I miss or misunderstand something? Thanks!
[0]: https://godbolt.org/z/zqrzvKxfz
Best,
Pengji
Thanks for writing, Pengji!
> on x86-64, SysV requires that only parameters smaller than 16 bytes > would be passed in this way
Smaller or equal! Your struct fat is passed exactly as I suggested, with
"ptr" in rdi and "len" in rsi. So my statement is true for the 2-element
struct discussed at that point in my article.
However, you're absolutely right about 3-element struct slices, which is
where I was heading in my article. I hadn't realized the "two eightbytes"
limit specified in the SysV ABI under 3.2.3-5c. That's too bad! I wish it
was just a little larger, perhaps 32 bytes. I'm guessing 16 bytes was
chosen for congruence with SSE. Looks like Aarch64 is basically in the
same boat. I've just added a small note about your correction.
If I reduce len and cap to 32 bits, then it falls inside the 16-byte
limit, and len+cap are passed packed in rsi. Which perhaps makes that
definition worth more consideration…
When I wrote this article five years ago I hadn't actually put this idea
into practice in C, and was mostly speculating. However, per more recent
articles, I've been leaning into this concept the last couple of years in
C, especially for strings, which have the 2-element representation (also
just like Go). No noticeable performance issues there. If anything, it's
been more performant, as a instrumental piece of a more efficient overall
technique.
I've used them less than strings, so I say this with less confidence, but
I also haven't noticed performance issues copying 3-element slice headers
around. I suspect — but haven't verified — that this is for three reasons:
(1) I use unity builds so most such calls are inlined away, (2) copying
isn't as expensive as it might seem (at least on modern desktop/server
CPUs), and (3) it probably resolves aliasing conflicts, and so enables
optimizations that might not otherwise happen without copying.
Anyway, thanks for pointing this out!
Thank you for the thorough explanation!
> Smaller or equal! Your struct fat is passed exactly as I suggested, with> "ptr" in rdi and "len" in rsi. So my statement is true for the 2-element> struct discussed at that point in my article.
Thanks for the correction! I indeed meant "smaller or equal."
> I've used them less than strings, so I say this with less confidence, but> I also haven't noticed performance issues copying 3-element slice headers> around. I suspect — but haven't verified — that this is for three reasons:> (1) I use unity builds so most such calls are inlined away, (2) copying> isn't as expensive as it might seem (at least on modern desktop/server> CPUs), and (3) it probably resolves aliasing conflicts, and so enables> optimizations that might not otherwise happen without copying.
I have heard one case where manually unpacking a 3-element struct
could greatly improve the performance, but I myself have not noticed
such issues either. By the way, I just learned that if the compiler
decides to not inline a function, it will follow the same calling
convention even for `static` functions:
https://godbolt.org/z/rW78hxa1x (GCC and Clang both do this). I did
not found out why though.
Regards,
Pengji