The article states:
> Disabling PIE with -no-pie is necessary in real applications or else
> strings won’t work.
Can you elaborate? Maybe it's obvious if you know the concrete details
of how PIE works but I only have a high level overview and so it's not
obvious to me why -no-pie is necessary here.
- NRK
Good question. I hadn't investigated when I wrote the article, and I still
didn't know the specifics when I saw your message here. Investigating the
segfault in GDB, I found that all my counted strings initialized with null
pointers. It smelled like a PIE issue, and the problem went away when I
disabled PIE at link time.
But I got to the bottom of it! For the sake of those following along, a
u-config string is a 16-byte header: a pointer-length tuple. After macro
expansion, initialization looks like:
s8 example = (s8){"example", sizeof("example")-1};
OpenBSD Clang puts a copy of `example` in relro / rodata. To initialize,
Clang copies it onto the stack with a movups load and movaps store (good
thing I aligned the stack, eh!). Of course, the relro entry can't contain
a RIP-relative address, so it gets a relocation entry. Linked as PIE, the
dynamic loader has to fill these out, but with -static there is no loader,
and so nobody ever patches the addresses. This should have been detected
as a linker error, so I personally count this as a linker bug.
If I use -fno-pie, it goes into rodata instead of relro, but now it's a
linker error (correct!) because the object isn't relocatable. So I also
need -no-pie. If I omit -fno-pie but keep -no-pie, Clang puts it in relro,
but the linker patches the addresses at link time as if it were rodata, so
it works out. It's insufficient to use -z,norelro because ld is still set
on the dynamic loader patching it.
Testing Clang on my Debian system, I'm seeing the same mov{a,u}ps and
relro, so that's pretty normal, not unique to OpenBSD. I hadn't noticed
before that Clang produces relocations for my counted string style, which
means there's a load-time cost for every individual initialization in a
relocatable image. (By load-time cost, I mean that the image is a little
larger and the dynamic linker has to patch an entry.) I'm not thrilled
about this.
I'll consider how I might add this information to the article. I'd want to
elaborate more on these terms.
Addendum: OpenBSD's toolchain isn't in healthy shape, which discouraged
investigating issues beyond fixes. GDB is more broken than usual (it's
quite old), as is Binutils generally. Notice, for example, I used __start
in my port but _start in the article? There seems to be a typo in ld
depending on whether you use -nostdlib or -nostartfiles. To investigate
the above I had to use llvm-objdump because Binutils objdump produced
incorrect output.
As with Ubuntu, OpenBSD enables extra security features by default, which
are enabled even when the user requests mutually exclusive options. So the
results are broken outside of the small, if most common, set of sanctioned
build configurations. Users must claw back defaults one flag at a time,
with those flags changing between releases.