~skeeto/public-inbox

2 2

Re: Practical libc-free threading on Linux

Details
Message ID
<262I6LMAT29KT.1ZIXKWXJCSR0O@rnpnr.xyz>
DKIM signature
permerror
Download raw message
Hi! Thanks for the cool post.

One thing I noticed when I applied this is that GDB is unable to
determine a valid call graph when the thread was spawned using
this method. Applying your method directly gives:

bt -frame-info short
#0  linux_render_thread_entry (stack=0x7ffff7b6bfc0)
#1  0x00007fffbc600000 in ?? ()
#2  0x00007fffbce00000 in ?? ()
#3  0x00007ffff7a631d0 in ?? ()
#4  0x0000555575576158 in linux_ctx ()
#5  0x00005555755761d0 in linux_ctx ()
#6  0x0000000000000000 in ?? ()

linux_ctx is a global variable which is in no way refrenced by my
stack memory, it simply exists in the same program. To solve this
I came up with the following version of newthread():

/* NOTE: based on code from nullprogram (Chris Wellons) */
__attribute__((naked))
static i64 new_thread(void *stack_base)
{
	asm volatile (
		"mov  %%rdi,    %%rsi\n" // arg2 = new stack
		"mov  $0x50F00, %%edi\n" // arg1 = clone flags (VM|FS|FILES|SIGHAND|THREAD|SYSVMEM)
		"mov  $56,      %%eax\n" // SYS_clone
		"syscall\n"
		"test %%eax, %%eax\n"    // don't mess with the calling thread's stack
		"jne  1f\n"
		"mov  %%rsp, %%rdi\n"
		"sub  $8,    %%rsp\n"    // place a 0 return branch pointer on the child's stack
		"push (%%rdi)\n"         // push the entry point back onto the stack for use by ret
		"1: ret\n"
		: : : "rax", "rcx", "rsi", "rdi", "r11", "memory"
	);
}

This gives a much nicer call graph:
bt -frame-info short
#0  linux_render_thread_entry (stack=0x7ffff7b6bfc0)
#1  0x0000000000000000 in ?? ()

If you prefer to have newthread() be at the bottom of the call
graph you can push the value of the instruction pointer onto the
stack ala the call instruction:

__attribute__((naked))
static i64 new_thread(void *stack_base)
{
	asm volatile (
		"mov  %%rdi,    %%rsi\n"  // arg2 = new stack
		"mov  $0x50F00, %%edi\n"  // arg1 = clone flags (VM|FS|FILES|SIGHAND|THREAD|SYSVMEM)
		"mov  $56,      %%eax\n"  // SYS_clone
		"syscall\n"
		"test %%eax,    %%eax\n"  // don't mess with the calling thread's stack
		"jne  1f\n"
		"sub  $16,      %%rsp\n"  // place a 0 return branch pointer on the child's stack
		"lea  0(%%rip), %%rdi\n"
		"push %%rdi\n"            // push a return to this location onto the child's stack
		"lea  24(%%rsp), %%rdi\n"
		"push (%%rdi)\n"          // push the entry point back onto the stack for use by ret
		"1: ret\n"
		: : : "rax", "rcx", "rsi", "rdi", "r11", "memory"
	);
}

bt -frame-info short
#0  linux_render_thread_entry (stack=0x7ffff7b6bfc0)
#1  0x0000555555557cac in new_thread (stack_base=0x3)
#2  0x0000000000000000 in ?? ()

If you want the value of stack_base to be correct you have to do
some more function prologue work (not shown because I don't think
its valuable).

As a final note I don't think its possible to completely eliminate
the bottomost call from the call graph using your method where the
entry point is at the top of the stack. musl's pthread_create()
puts two extra guard pages at the top of the stack (mmap with
PROT_NONE) and I'm guessing GDB uses this to determine the true
end of the stack (by accessing the guard page and handling the
generated SIGSEGV). In our case the top of stack contains the
entry function pointer which without the aforementioned 0 bytes
looks like a return address.

Cheers!
Randy

--
https://rnpnr.xyz/
GPG Fingerprint: B8F0 CF4C B6E9 415C 1B27 A8C4 C8D2 F782 86DF 2DC5

Re: Practical libc-free threading on Linux

Details
Message ID
<20241202001144.yum66hk5tavw6una@nullprogram.com>
In-Reply-To
<262I6LMAT29KT.1ZIXKWXJCSR0O@rnpnr.xyz> (view parent)
DKIM signature
missing
Download raw message
Good point. The important information is all still there, just with extra 
junk. Even normal, conventional programs have messy, noisy stack traces, 
so I'm accustomed to mentally ignoring it, and I hadn't really noticed.

Before your message, I had thought it was a matter of zeroing rbp in the 
new thread to establish a "bottom" of the stack pointer chain, but I tried 
it similarly to your hack here, and it didn't work! I'm surprised pushing 
a zero return pointer didn't entirely work either. In your example there's 
still that one extra junk frame. This seems like an obvious deficiency in 
GDB not treating address zero as a stop sign.

I know it's not strictly tied to DWARF because backtraces are clean on the 
main thread, even with an assembly entry point. Maybe it's the guard page 
like you said, except that the main stack has the argv, envp, and auxv 
arrays beyond the last stack frame. I couldn't quickly locate the code in 
GDB responsible for it, so I have no idea how GDB decides to stop.

I gave LLDB a shot at generating backtraces, and either zeroing rbp or 
pushing a zero return pointer are sufficient to communicate a backtrace 
sentinel. The obvious stuff works. (This result does not surprise me.)

Re: Practical libc-free threading on Linux

Details
Message ID
<33L8DJADTJCHI.21PCB399MCQ2U@rnpnr.xyz>
In-Reply-To
<20241202001144.yum66hk5tavw6una@nullprogram.com> (view parent)
DKIM signature
permerror
Download raw message
Christopher Wellons <wellons@nullprogram.com> wrote:
> Before your message, I had thought it was a matter of zeroing rbp in the
> new thread to establish a "bottom" of the stack pointer chain, but I tried
> it similarly to your hack here, and it didn't work! I'm surprised pushing
> a zero return pointer didn't entirely work either. In your example there's
> still that one extra junk frame. This seems like an obvious deficiency in
> GDB not treating address zero as a stop sign.

Zeroing rbp is something which musl's clone does but I don't think
GDB uses it at all. The ABI allows rbp to just be used as an extra
general purpose register so even though compilers push rbp onto
the stack immediately following the instruction pointer in the
function prologue GDB might just ignore it. (I'm not really sure
but my guess is pushing rbp is just for keeping the stack aligned
since the only thing needed to return to the caller is the pushed
instruction pointer).

> I know it's not strictly tied to DWARF because backtraces are clean on the
> main thread, even with an assembly entry point. Maybe it's the guard page
> like you said, except that the main stack has the argv, envp, and auxv
> arrays beyond the last stack frame. I couldn't quickly locate the code in
> GDB responsible for it, so I have no idea how GDB decides to stop.

I actually know this one - GDB special cases main[0]. Assuming
what they mean by PC in the code that follows is instruction
pointer then the check on line 2731 appears to try and handle the
case I created where the value is 0. However they make the
assumption that the frame they are checking is not the base frame
which is incorrect in our case. It might just be a bug but I don't
know how deep I want to dive into understanding GDB's source code.

> I gave LLDB a shot at generating backtraces, and either zeroing rbp or
> pushing a zero return pointer are sufficient to communicate a backtrace
> sentinel. The obvious stuff works. (This result does not surprise me.)

Interesting, I should have checked. Unfortunately I don't think
tui based debuggers are useful for any real debugging and I don't
see any reasonably complete frontends for lldb (besides vscode
which I wont use). At least gdb has nakst's gf[1].

[0]: https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/frame.c;h=a6900b280724a0fdc77611b5076b033553fbe6c2;hb=HEAD#l2675
[1]: https://github.com/nakst/gf

--
https://rnpnr.xyz/
GPG Fingerprint: B8F0 CF4C B6E9 415C 1B27 A8C4 C8D2 F782 86DF 2DC5
Reply to thread Export thread (mbox)