~skeeto/public-inbox

3 3

Re: Practical libc-free threading on Linux

Details
Message ID
<D5X8GJORCS80.34CNR97XCPVT2@gmail.com>
DKIM signature
pass
Download raw message
(Link to your original blog post -
https://nullprogram.com/blog/2023/03/23/);

Hey! Great post, it really helped me to understand what it was that
clone was actually doing.

As a small exercise, I am trying to get your implementation working with
clone3, but I am having some trouble. I was hoping that you could help
me out. Your implementation 'just worked' after dropping in your
newthread, and setting up the stack pointer as you did. But using clone3
I cannot get the entry function to run, despite clone and clone3 seeming
not substantially different. The only meaningful difference that I can
see that could give could give me trouble are these lines from the man
pages:

(https://man7.org/linux/man-pages/man2/clone.2.html)

clone states -
  "stack usually points to the topmost address of the memory space set
  up for the child stack"

while clone3 states -

  "The stack for the child process is specified via cl_args.stack, which
  points to the lowest byte of the stack area, and cl_args.stack_size,
  which specifies the size of the stack in bytes."

Are these statements the same? The language seems ambiguous, as (in my
experience) top and bottom, high and low are often used interchangeably
when talking about the stack. In the case of clone, one can deduce that
the pointer must point to the highest numerical address for the stack
(of course offset by the size of stack_head in order to have
stack_head.entry called upon the ret instruction from newthread) because
without stack_size, there is no way to correctly position the stack
pointer for the stack to grow down. However, in the case of clone3, we
cannot make the same deduction, because stack_size is present. Can you
clear this up for me? Where is clone_args.stack supposed to point to? Is
it the same address as it would be for clone, or should it be the same
address as the one returned from malloc, with stack_size decremented by
the size of stack_head?

Below is my implementation. It is mostly the same as yours, however I
removed the line from newthread which puts %rdi in %rsi, as using
clone3, the arguments for the syscall should already be in the correct
registers (am I correct?) with my newthread implementation.

Can you see where I have gone wrong?

__attribute((naked))
static long newthread(struct clone_args __attribute((unused)) *args,
                      long __attribute((unused)) args_size)
{
  // SYS_clone3 == 435
  __asm volatile (
      "mov $435, %%eax\n"  // SYS_clone3
      "syscall\n"
      "mov %%rsp, %%rdi\n" // entry point argument
      "ret\n"
      : : : "rax", "rcx", "rsi", "rdi", "r11", "memory"
  );
}

int main() {
  u64 stack_size = 1024 * 1024;
  struct stack_head *stack = malloc(stack_size);

  stack = stack + stack_size / sizeof(*stack) - 1;
  stack->entry = entry;
  stack->join = 0;

  struct clone_args maybe_unused args = {};
  args.stack       = (u64) stack;
  args.stack_size  = stack_size;
  args.exit_signal = SIGCHLD;
  args.flags       = CLONE_FILES|CLONE_FS|CLONE_SIGHAND|
                     CLONE_SYSVSEM|CLONE_THREAD|CLONE_VM;

  newthread(&args, sizeof(args));

  // ...
}

I do not think that there is anything else in my code that could be
causing entry to never be called, so I left it out for brevity.

Also, thanks so much for the blog post and for putting your
implementation in the public domain: it is often very difficult to find
good learning resources, so articles by experts like you are incredibly
useful and deeply inspiring, as it is exciting to see how much there is
to learn after years of practice and exploration.

Re: Practical libc-free threading on Linux

Details
Message ID
<35S4A9DZIOWO9.2EHQMQLRR01KJ@rnpnr.xyz>
In-Reply-To
<D5X8GJORCS80.34CNR97XCPVT2@gmail.com> (view parent)
DKIM signature
permerror
Download raw message
"Sol" <solomoncardenbrown@gmail.com> wrote:
> Are these statements the same? The language seems ambiguous, as (in my
> experience) top and bottom, high and low are often used interchangeably
> when talking about the stack.

Definitely not, all common architectures have stacks that grow
down. The top of the stack is the highest address, the bottom of
the stack is the lowest address. Instructions such as push/pop
have this directionality baked into them.

> Where is clone_args.stack supposed to point to? Is
> it the same address as it would be for clone, or should it be the same
> address as the one returned from malloc, with stack_size decremented by
> the size of stack_head?

It is the lowest memory address to be used for the stack i.e. the
one returned from malloc. After the syscall %rsp will be equal to
(stack + stack_size) from the passed in clone_args.

> Below is my implementation. It is mostly the same as yours, however I
> removed the line from newthread which puts %rdi in %rsi, as using
> clone3, the arguments for the syscall should already be in the correct
> registers (am I correct?) with my newthread implementation.

Yes as per the System V calling convention[0, p. 21] %rdi will be
the first argument of the function (assuming it fits in 8 bytes)
and %rsi will be the second argument of the function. This is what
linux syscalls expect.

> Can you see where I have gone wrong?
>
> int main() {
>   u64 stack_size = 1024 * 1024;
>   struct stack_head *stack = malloc(stack_size);
>
>   stack = stack + stack_size / sizeof(*stack) - 1;
>   stack->entry = entry;
>   stack->join = 0;
>
>   struct clone_args maybe_unused args = {};
>   args.stack       = (u64) stack;
>   args.stack_size  = stack_size;
>   args.exit_signal = SIGCHLD;
>   args.flags       = CLONE_FILES|CLONE_FS|CLONE_SIGHAND|
>                      CLONE_SYSVSEM|CLONE_THREAD|CLONE_VM;
>
>   newthread(&args, sizeof(args));
>
>   // ...
> }

The only issue is that your args.stack points to the top of the
stack instead of the bottom. If you correct that like so:

void *stack = malloc(stack_size);
struct stack_head *stack_head = (struct stack_head *)stack + stack_size / sizeof(*stack_head) - 1;

everything should work fine.

- Randy

[0]: https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf

--
https://rnpnr.xyz/
GPG Fingerprint: B8F0 CF4C B6E9 415C 1B27 A8C4 C8D2 F782 86DF 2DC5

Re: Practical libc-free threading on Linux

Details
Message ID
<20241201234235.lezfluaqb4cg2goq@nullprogram.com>
In-Reply-To
<D5X8GJORCS80.34CNR97XCPVT2@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Hi, Sol, I'm happy you're taking this concept and running with it. There's 
an almost identical discussion a year ago here, with the same issue:

https://lists.sr.ht/~skeeto/public-inbox/%3CNi9xBAu--3-9@tutanota.com%3E

The original clone doesn't know the size of the stack, so it must accept 
the high address (for downward-growing stacks). If you gave it the low 
address, it couldn't determine the high address. Since clone3 knows the 
stack size, it follows the normal convention of accepting the low address 
in all cases, just how everything else normally works.

(Randy's response has you covered for the rest.)

Re: Practical libc-free threading on Linux

Details
Message ID
<D60T7005DLGR.1L2IPH9T71WOZ@gmail.com>
In-Reply-To
<20241201234235.lezfluaqb4cg2goq@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
> Since clone3 knows the stack size, it follows the normal convention of
> accepting the low address in all cases.
This makes sense, thanks for the clarification. I will get back to my
implementation!

> Hi, Sol, I'm happy you're taking this concept and running with it. There's 
> an almost identical discussion a year ago here, with the same issue:

My apologies for not spotting this existing discussion, I only checked
the ones under the original blog post, I should have looked a bit
deeper!

Sol
Reply to thread Export thread (mbox)