~skeeto/public-inbox

2 2

strcpy: a niche function you don't need

Details
Message ID
<ad2fc7759fba9155bbe6618dee6c8d70@dbj.org>
DKIM signature
missing
Download raw message
I am not a particular fan of "Annex K", but it seems Renie Urban
has a rather elaborate implementation:

https://github.com/rurban/safeclib

I might dare to think it seems you perhaps did not know about it at the 
time of writing :)

And. Interestingly WIN32 provides, WIN32 replacement for all CRT string 
manipulation:
(although in some kind of "crazy camel" functions naming style)

https://docs.microsoft.com/en-us/windows/win32/api/strsafe/

"interestingly" because you do not seem fond of CRT on the Windows 
platform. Neither am I.
But I assume you knew about Win32 <strsafe.h>


Kind regards
Details
Message ID
<20220105004139.rve2cgjwl5y7fhlb@nullprogram.com>
In-Reply-To
<ad2fc7759fba9155bbe6618dee6c8d70@dbj.org> (view parent)
DKIM signature
missing
Download raw message
I was aware of its sibling Intel fork when writing my article, but not 
this one until a couple weeks ago.

While it's improved since I wrote my article (July), this still doesn't 
pass my threshold for practical. Part of why I hadn't found it is that it 
wasn't part of a stable Debian release until this past August. That it's 
gradually being introduced to distributions makes it slightly more 
practical. However, If I wanted to integrate it into my own projects, 
there's no embeddable safeclib.[ch], a la sqlite3.[ch], that I can just 
drop into my project. I have to somehow drive their Autoconf build as part 
of my build. That's a lot of complexity given how little this library 
provides.

The only reason I can imagine anyone would use it is because they were 
confused, or they're doing it on the behalf of a confused person.

> But I assume you knew about Win32 <strsafe.h>

I did not, thanks! Some of these could be useful if I was writing some 
kind of specialized, CRT-free code (e.g. my w64devkit command aliases). 
Curiously, StringCchCopyNA, despite being quite the mouthful, manages to 
be better than strlcpy, which has broken semantics (see Linus's strlcpy 
rant). Though right next to it is StringCchCopyA, which is as broken as 
strlcpy.

I like that there's the equivalent of an RSIZE_MAX, about the only 
salvageable part of Annex K (and optional at that, which makes Annex K 
even worse).

But still like Annex K, it's confusing two different needs and so ends up 
being bad at both. The two needs are:

1. Handling untrusted inputs with unknown properties. This is particularly 
important in kernel space, and (guessing) it's why the strsafe.h functions 
are all documented to work "even if the contents of the source string 
change during the operation." In other words, they still work correctly 
when shared with a hostile thread/process.

2. Protecting against program defects. The programmer has made a mistake, 
and the additional size parameters help to catch those mistakes before the 
program does something dangerous. This is the entire purpose of having a 
restricted size_t. Per C11: "Extremely large object sizes are frequently a 
sign that an object’s size was calculated incorrectly." It's also the only 
reason to check for null pointers.

Annex K is unequipped for (1) — its behavior is far too loose — and can 
only do (2). As such, there's no way to safely or correctly handle a 
runtime constraint violation since the program is in an unanticipated 
state as the result of a defect. The only reasonable action is to abort 
the process just like an assert failure, a la _FORTIFY_SOURCE. Supporting 
a configurable constraint handler suggests either trying to support (1) as 
well as (2), or confusion about the meaning of a constraint violation.

Though kind of overkill, you can cover (2) with a (1) library by asserting 
that return value is a success. You cannot cover (1) with a (2) library.

strsafe.h appears to be designed for (1), which makes sense as part of 
Win32, vs. a CRT. However, some string functions still require null 
terminated source strings, otherwise it's undefined behavior. That's 
unsuitable for (1) — an untrusted input could have any content — so 
whoever designed it was thinking about (2) instead of (1). They got 
confused and put landmines in their (1) library.

I hardly ever have use for a (1) library since I just write boring old 
application code. A well-designed (2) could be useful, but even if Annex K 
were readily available, it still wouldn't fit the bill, not by a long 
shot. It doesn't actually address the hardest problem: computing sizes 
without overflow or underflow. Nor the second hardest problem: indexing, 
especially off-by-one mistakes. RSIZE_MAX can catch some of this, but it's 
far from the actual defect.

It would have been more effective if the standards committee had simply 
standardized BSD reallocarray instead of any of Annex K. It actually 
addresses a common case of the hardest problem. If you want better safety, 
move size computations into the standard library (or the language itself) 
where they'll be done carefully.

Side note: My only complaint about reallocarray is that it should have had 
another size parameter, a base size, so that it computes (base+nmemb*size) 
all with overflow checks. That covers resizing flexible array members.
Details
Message ID
<CAAiNwPoO3VBxQiumLczEGvLFR5dDyPBYOiSaZqBUDOsnZyyKzw@mail.gmail.com>
In-Reply-To
<20220105004139.rve2cgjwl5y7fhlb@nullprogram.com> (view parent)
DKIM signature
missing
Download raw message
A very comprehensive answer, thanks :)

" ... if I was writing some kind of specialized, CRT-free code..."

One can go drama wise and ask: "Why would anyone use CRT inside WIn32
Apps?" ... It might be easy to imagine a set of macros using the Win32
and replacing the CRT code on the outside. Completely.

ps: There is also this interesting development (loosely related):

memccpy

https://developers.redhat.com/blog/2019/08/12/efficient-string-copying-and-concatenation-in-c

Admittedly of a much narrower scope...

Kind regards ...


On Wed, 5 Jan 2022 at 01:41, Christopher Wellons
<wellons@nullprogram.com> wrote:
>
> I was aware of its sibling Intel fork when writing my article, but not
> this one until a couple weeks ago.
>
> While it's improved since I wrote my article (July), this still doesn't
> pass my threshold for practical. Part of why I hadn't found it is that it
> wasn't part of a stable Debian release until this past August. That it's
> gradually being introduced to distributions makes it slightly more
> practical. However, If I wanted to integrate it into my own projects,
> there's no embeddable safeclib.[ch], a la sqlite3.[ch], that I can just
> drop into my project. I have to somehow drive their Autoconf build as part
> of my build. That's a lot of complexity given how little this library
> provides.
>
> The only reason I can imagine anyone would use it is because they were
> confused, or they're doing it on the behalf of a confused person.
>
> > But I assume you knew about Win32 <strsafe.h>
>
> I did not, thanks! Some of these could be useful if I was writing some
> kind of specialized, CRT-free code (e.g. my w64devkit command aliases).
> Curiously, StringCchCopyNA, despite being quite the mouthful, manages to
> be better than strlcpy, which has broken semantics (see Linus's strlcpy
> rant). Though right next to it is StringCchCopyA, which is as broken as
> strlcpy.
>
> I like that there's the equivalent of an RSIZE_MAX, about the only
> salvageable part of Annex K (and optional at that, which makes Annex K
> even worse).
>
> But still like Annex K, it's confusing two different needs and so ends up
> being bad at both. The two needs are:
>
> 1. Handling untrusted inputs with unknown properties. This is particularly
> important in kernel space, and (guessing) it's why the strsafe.h functions
> are all documented to work "even if the contents of the source string
> change during the operation." In other words, they still work correctly
> when shared with a hostile thread/process.
>
> 2. Protecting against program defects. The programmer has made a mistake,
> and the additional size parameters help to catch those mistakes before the
> program does something dangerous. This is the entire purpose of having a
> restricted size_t. Per C11: "Extremely large object sizes are frequently a
> sign that an object’s size was calculated incorrectly." It's also the only
> reason to check for null pointers.
>
> Annex K is unequipped for (1) — its behavior is far too loose — and can
> only do (2). As such, there's no way to safely or correctly handle a
> runtime constraint violation since the program is in an unanticipated
> state as the result of a defect. The only reasonable action is to abort
> the process just like an assert failure, a la _FORTIFY_SOURCE. Supporting
> a configurable constraint handler suggests either trying to support (1) as
> well as (2), or confusion about the meaning of a constraint violation.
>
> Though kind of overkill, you can cover (2) with a (1) library by asserting
> that return value is a success. You cannot cover (1) with a (2) library.
>
> strsafe.h appears to be designed for (1), which makes sense as part of
> Win32, vs. a CRT. However, some string functions still require null
> terminated source strings, otherwise it's undefined behavior. That's
> unsuitable for (1) — an untrusted input could have any content — so
> whoever designed it was thinking about (2) instead of (1). They got
> confused and put landmines in their (1) library.
>
> I hardly ever have use for a (1) library since I just write boring old
> application code. A well-designed (2) could be useful, but even if Annex K
> were readily available, it still wouldn't fit the bill, not by a long
> shot. It doesn't actually address the hardest problem: computing sizes
> without overflow or underflow. Nor the second hardest problem: indexing,
> especially off-by-one mistakes. RSIZE_MAX can catch some of this, but it's
> far from the actual defect.
>
> It would have been more effective if the standards committee had simply
> standardized BSD reallocarray instead of any of Annex K. It actually
> addresses a common case of the hardest problem. If you want better safety,
> move size computations into the standard library (or the language itself)
> where they'll be done carefully.
>
> Side note: My only complaint about reallocarray is that it should have had
> another size parameter, a base size, so that it computes (base+nmemb*size)
> all with overflow checks. That covers resizing flexible array members.
Reply to thread Export thread (mbox)