~skeeto/public-inbox

4 3

Re: A simple, arena-backed, generic dynamic array for C

Matthew Fernandez <matthew.fernandez@gmail.com>
Details
Message ID
<c839a971-9441-6633-24e4-bd213c172281@gmail.com>
DKIM signature
missing
Download raw message
Great article as always!

On 10/6/23 00:05, Chris Wellons wrote:
> *A simple, arena-backed, generic dynamic array for C*
> https://nullprogram.com/blog/2023/10/05/ 
> 
> Copying over old data includes a special check for zero-length inputs, 
> because, quite frustratingly <https://nullprogram.com/blog/2023/02/11/>, 
> |memcpy| does not accept null even when the length is zero.
Do you know what the motivation for this is? I too have tripped over 
this in the past and I cannot think of any useful compiler optimisation 
or platform behaviour that is enabled by this assumption.

Re: A simple, arena-backed, generic dynamic array for C

Matthew Fernandez <matthew.fernandez@gmail.com>
Details
Message ID
<dd0778ae-6297-a002-234a-d465b5715fc1@gmail.com>
In-Reply-To
<c839a971-9441-6633-24e4-bd213c172281@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Breaking the rules and replying to my own post as I learned some things 
that may answer the question and be of interest to you/others.

On 10/10/23 10:01, Matthew Fernandez wrote:
> On 10/6/23 00:05, Chris Wellons wrote:
>> *A simple, arena-backed, generic dynamic array for C*
>> https://nullprogram.com/blog/2023/10/05/
>> Copying over old data includes a special check for zero-length inputs, 
>> because, quite frustratingly 
>> <https://nullprogram.com/blog/2023/02/11/>, |memcpy| does not accept 
>> null even when the length is zero.
> Do you know what the motivation for this is? I too have tripped over 
> this in the past and I cannot think of any useful compiler optimisation 
> or platform behaviour that is enabled by this assumption.

Today I tripped over `qsort(NULL, 0, …)` being UB and had the same 
confusion. I went down the rabbit hole and surfaced:

[0] Paul Eggert eggert at cs.ucla.edu, Wed Oct 26 17:23:44 UTC 2022
> Although qsort(NULL, 0, ...) works fine on most practical platforms, as 
> Clive noted there is (or was) an oddball platform or two that is (or 
> was) verrrry picky about pointers, and for a program like zic where 
> portability is more important than performance, it's nice if zic runs 
> even on oddballs. Clive, do you happen to know what these platforms are 
> (or were), and whether they're still supported?

[1] Clive D.W. Feather clive at davros.org, Thu Oct 27 06:38:42 UTC 2022
> Sorry, but I have no idea. We didn't always get told - if someone at the
> meeting said "I know an platform that does X", we believed them.
> 
> There was a longstanding meme in the meetings (and on comp.std.c) that "if
> you don't understand the reasoning behing something in the C Standard, the
> answer is the IBM AS400".

[0]: https://mm.icann.org/pipermail/tz/2022-October/032107.html
[1]: https://mm.icann.org/pipermail/tz/2022-October/032115.html

Re: A simple, arena-backed, generic dynamic array for C

Details
Message ID
<20231012021513.7wwytohzlb3ctukd@nullprogram.com>
In-Reply-To
<dd0778ae-6297-a002-234a-d465b5715fc1@gmail.com> (view parent)
DKIM signature
missing
Download raw message
I don't know the history, but C and C++ generally regard null pointers as 
super special — very much not a valid pointer to a zero-sized object. I 
hadn't thought about it with qsort before, but other cases include fwrite 
and fread. C++ takes it even further: When the standard says a function 
accepts an "array" (e.g. std::ostream::write) you're not even allowed to 
use zero as a count/nmemb with a non-null pointer because arrays cannot be 
zero length. It's nuts, like they were trying to make gotchas.

Though whatever historical circumstance established this should not matter 
today, GCC uses it as an optimization hint. If a pointer is passed to one 
of these functions, then it's assumed not null, and that hint propagates 
forward potentially removing null pointer checks and such.

A half workaround is to write your own copy/move/set functions with your 
own names and sane semantics, then use those instead. However, you still 
have to watch for issues like adding zero to a null pointer which, again, 
_should_ be something you could do. It's natural and eliminates a bunch of 
needless edge case checks, because they're not _really_ edge cases. But 
it's used as an optimization hint, so you can't do it.

I wish these standard functions accepted null pointers when given zero 
counts, and that adding/subtracting zero with a null pointer was defined 
to yield a null pointer. It would simplify a great deal of code, eliminate 
a ton of real world bugs, and probably cost virtually nothing in missed 
optimizations. (I suspect most cases when a genuine slowdown is observed, 
it's because the program had undefined null pointer use.) I'm fine with 
leaving non-null offsets and counts on null pointers as undefined.

Re: A simple, arena-backed, generic dynamic array for C

Matthew Fernandez <matthew.fernandez@gmail.com>
Details
Message ID
<bc0af4c9-0a11-ea9d-d031-94b7420732e1@gmail.com>
In-Reply-To
<20231012021513.7wwytohzlb3ctukd@nullprogram.com> (view parent)
DKIM signature
missing
Download raw message

On 10/12/23 13:15, Christopher Wellons wrote:
> Though whatever historical circumstance established this should not 
> matter today, GCC uses it as an optimization hint. If a pointer is 
> passed to one of these functions, then it's assumed not null, and that 
> hint propagates forward potentially removing null pointer checks and such.

In my prior reading, I came across a claim that the non-null requirement 
allowed a pre-fetching optimisation¹ but I wrote this off as spurious 
because I believe pre-fetching a non-null address at which there are 0 
array elements can still trap on some platforms. Your theory makes much 
more sense; thanks! IIUC you’re saying, though `qsort` may not directly 
use the pointer when `nmemb` is 0, it may call another function that 
performs non-null checks. With this assumption in the standard, the 
callee can be inlined/specialised to remove the check. Nice!

¹ https://mm.icann.org/pipermail/tz/2022-October/032111.html

> A half workaround is to write your own copy/move/set functions with your 
> own names and sane semantics, then use those instead.

Ah yes, I guess this in itself is a reasonable supporting argument too. 
If the user really wants no UB here, the wrapper they need to implement 
is no less efficient than the equivalent library implementation would be.

Re: A simple, arena-backed, generic dynamic array for C

Details
Message ID
<euurl7nr6tfmith3md4j2czzzalysn2emse5b4i7p36bhsfrum@7fouw2fw232z>
In-Reply-To
<bc0af4c9-0a11-ea9d-d031-94b7420732e1@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Hi Matthew,

> In my prior reading, I came across a claim that the non-null
> requirement allowed a pre-fetching optimisation

GCC's `__builtin_prefetch()` allows the argument to be NULL. And GCC has
been ported to a good amount of platforms. So I doubt that's a valid
reason.

> `qsort` may not directly use the pointer when `nmemb` is 0, it may
> call another function that performs non-null checks.

This is less about "calling another function" and more about GCC's
`nonnull` attribute (or similar). For example:

	extern void f(void *p) __attribute(( nonnull(1) ));
	
	int isNull(void *p)
	{
	    f(p);
	    return p == 0;
	}

In the above snippet, the nonnull attribute in the function prototype
informs GCC that the argument 1 (p in this case) cannot be null. And
since the `isNull` function calls `f(p)`, GCC assumes that p is non-null
and will elide the `p == 0` check and always return 0:
https://godbolt.org/z/no35fP9fG

Also note that the compiler can assume that arguments to qsort/memcpy
etc are non-null *even if the libc doesn't mark it as such* since the
standard says it's undefined.

> If the user really wants no UB here, the wrapper they need to
> implement is no less efficient than the equivalent library
> implementation would be.

I have extremely high doubts that the speed benefit of removing a couple
null checks outweight the development friction created by them. AFAIK,
linux kernel outright disables this "optimization" by passing
`-fno-delete-null-pointer-checks` explicitly.

- NRK
Reply to thread Export thread (mbox)