~skeeto/public-inbox

2 2

Re: Go Slices are Fat Pointers

Pengji Zhang <kunhtkun@gmail.com>
Details
Message ID
<CANOCUiz9ZjRi06pvSDmKsXcHcTiWfAJCeKQUn3EYCh7Tv0poVA@mail.gmail.com>
DKIM signature
pass
Download raw message
Hi Chris,

I just came across this post and I quite like it. However, this line
confuses me:

> In typical C implementations, the structure fields would be passed
> practically, if not exactly, same way as the individual parameters
> would have been passed, so it’s really no less efficient.

I think that ABI function calling conventions control this, so the C
implementation cannot do much about it. For example, on x86-64, SysV
requires that only parameters smaller than 16 bytes would be passed in
this way [0], and Windows `__fastcall` convention only allows 8 bytes.

So I think functions that take fat pointers should be less efficient
in general, or did I miss or misunderstand something? Thanks!

[0]: https://godbolt.org/z/zqrzvKxfz

Best,
Pengji

Re: Go Slices are Fat Pointers

Details
Message ID
<20240617015613.ynqfh2n7srgg33v4@nullprogram.com>
In-Reply-To
<CANOCUiz9ZjRi06pvSDmKsXcHcTiWfAJCeKQUn3EYCh7Tv0poVA@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
Thanks for writing, Pengji!

> on x86-64, SysV requires that only parameters smaller than 16 bytes 
> would be passed in this way

Smaller or equal! Your struct fat is passed exactly as I suggested, with 
"ptr" in rdi and "len" in rsi. So my statement is true for the 2-element 
struct discussed at that point in my article.

However, you're absolutely right about 3-element struct slices, which is 
where I was heading in my article. I hadn't realized the "two eightbytes" 
limit specified in the SysV ABI under 3.2.3-5c. That's too bad! I wish it 
was just a little larger, perhaps 32 bytes. I'm guessing 16 bytes was 
chosen for congruence with SSE. Looks like Aarch64 is basically in the 
same boat. I've just added a small note about your correction.

If I reduce len and cap to 32 bits, then it falls inside the 16-byte 
limit, and len+cap are passed packed in rsi. Which perhaps makes that 
definition worth more consideration…

When I wrote this article five years ago I hadn't actually put this idea 
into practice in C, and was mostly speculating. However, per more recent 
articles, I've been leaning into this concept the last couple of years in 
C, especially for strings, which have the 2-element representation (also 
just like Go). No noticeable performance issues there. If anything, it's 
been more performant, as a instrumental piece of a more efficient overall 
technique.

I've used them less than strings, so I say this with less confidence, but 
I also haven't noticed performance issues copying 3-element slice headers 
around. I suspect — but haven't verified — that this is for three reasons: 
(1) I use unity builds so most such calls are inlined away, (2) copying 
isn't as expensive as it might seem (at least on modern desktop/server 
CPUs), and (3) it probably resolves aliasing conflicts, and so enables 
optimizations that might not otherwise happen without copying.

Anyway, thanks for pointing this out!

Re: Go Slices are Fat Pointers

Pengji Zhang <kunhtkun@gmail.com>
Details
Message ID
<CANOCUixTpqA+X_fg+UqVnSv81cir88ab3qo6Hfu0REze0aELXw@mail.gmail.com>
In-Reply-To
<20240617015613.ynqfh2n7srgg33v4@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
Thank you for the thorough explanation!

> Smaller or equal! Your struct fat is passed exactly as I suggested, with
> "ptr" in rdi and "len" in rsi. So my statement is true for the 2-element
> struct discussed at that point in my article.

Thanks for the correction! I indeed meant "smaller or equal."

> I've used them less than strings, so I say this with less confidence, but
> I also haven't noticed performance issues copying 3-element slice headers
> around. I suspect — but haven't verified — that this is for three reasons:
> (1) I use unity builds so most such calls are inlined away, (2) copying
> isn't as expensive as it might seem (at least on modern desktop/server
> CPUs), and (3) it probably resolves aliasing conflicts, and so enables
> optimizations that might not otherwise happen without copying.

I have heard one case where manually unpacking a 3-element struct
could greatly improve the performance, but I myself have not noticed
such issues either. By the way, I just learned that if the compiler
decides to not inline a function, it will follow the same calling
convention even for `static` functions:
https://godbolt.org/z/rW78hxa1x (GCC and Clang both do this). I did
not found out why though.

Regards,
Pengji
Reply to thread Export thread (mbox)