Hello! I'm trying to learn how I can spawn threads as I'm going to need it
for the system library of the new language I'm making. So, I have read the
stack head example and I understood it for the most part (and probably the
parts that I don't fully understand are what cause my problem). I'm trying to
create the example using the "clone3" system call as this is the system call
I want to use, but I have no lucky doing it.
I have created the "CloneArgs" structure and I have modified the "newthread"
function to take a "CloneArgs" type argument instead as, the stack head is
stored on its "stack" field for "clone3". I have also tried to do it using "futex"
and as this didn't work, I thought that this might be the problem as I know
NOTHING (other than reading the man pages and understanding a little bit
how they work in theory) about a "futex". I also thought about trying to use
the "waitid" system call". But in that version, the "waitid" system call also fails.
After so many hours of trying to understand, I finally thought about either
giving up (which I really, really don't want to) or ask for help. So, here I am!
Of course, I must say that I have spend hours trying to experiment and I have
also read the man pages, but I don't fully understand anything and I probably
miss or forget something. It's also the first time I see these attributes and I know
only some basic assembly, so I don't know if there is something wrong with the
inline assembly that I don't understand. One thing that I don't understand for sure
is the "mov %%rsp, %%rdi" assembly line in the "newthread" function. Why "rdi"
needs to have the value of "rsp"?
So yeah, sorry for that big thread. Below are the 3 files (1 header file and 2 source
files with the 2 different versions) so you can properly see what I'm trying to do
and if you can spot the mistake, explain to me what I did wrong. But thank you for
your work and time regardless of the outcome!
Header file: https://opengist.thomice.li/rempas/06510862a2474ee984c96ec6fe122919
Source file (futex version): https://opengist.thomice.li/rempas/21be0228a086440484f06dff58d886b5
Source file (waitid version): https://opengist.thomice.li/rempa/507560d6dfc649769d47bbc6b886a8ef
I'm glad to see that you've given this a shot! You're quite close, but
there area a few subtle details to sort out.
First, unlike clone(), clone3() takes a pointer to the low end of the
stack, which is mentioned on the man page. The original clone() does not
know the stack size, so it takes a pointer to the high end, as otherwise
the kernel could not find it. That simplified my demo, too. So you must
modify newstack() to return both ends of the stack, the high end for you
to populate and the low end to give to the kernel. I defined this to be
returned from newthread():
typedef struct {
void *tail;
stack_head *head;
} thread_stack;
Where "tail" points to the address returned by mmap, and "head" is the
original return value.
Second, you were passing the wrong address for the stack, using "&stack"
instead of just "stack". That means you were using the local variable as
the low address for the bottom of the stack, and disaster ensued. At this
very low level it's not possible to type check, so this hazard comes with
the territory.
Third, due to rounding when computing "count" the stack size is not
actually 1<<16 even though that's what you allocated. Because you're
passing the low end of the stack you need to give it exactly the offset to
the stack head. If you have a "thread_stack" per above, that looks like:
@@ -142,4 +150,4 @@
0, 0, 0, SIGCHLD /* I have also tried using "0" here */,
- (__aligned_u64)&stack, /* stack */
- 1 << 16, /* stack_size */
+ (__aligned_u64)stack.tail, /* stack */
+ (void *)stack.head - stack.tail, /* stack_size */
0, 0, 0, 0
This diff fixes both the second and third items. With these changes it
works as expected on my system. When you're ready to share I'm interested
in seeing this new programming language you're building, too.
> I'm glad to see that you've given this a shot! You're quite close, but there area a few subtle details to sort out.>> First, unlike clone(), clone3() takes a pointer to the low end of the stack, which is mentioned on the man page. The original clone() does not know the stack size, so it takes a pointer to the high end, as otherwise the kernel could not find it. That simplified my demo, too. So you must modify newstack() to return both ends of the stack, the high end for you to populate and the low end to give to the kernel. I defined this to be returned from newthread():>> typedef struct {> void *tail;> stack_head *head;> } thread_stack;>> Where "tail" points to the address returned by mmap, and "head" is the original return value.>> Second, you were passing the wrong address for the stack, using "&stack" instead of just "stack". That means you were using the local variable as the low address for the bottom of the stack, and disaster ensued. At this very low level it's not possible to type check, so this hazard comes with the territory.>> Third, due to rounding when computing "count" the stack size is not actually 1<<16 even though that's what you allocated. Because you're passing the low end of the stack you need to give it exactly the offset to the stack head. If you have a "thread_stack" per above, that looks like:>> @@ -142,4 +150,4 @@> 0, 0, 0, SIGCHLD /* I have also tried using "0" here */,> - (__aligned_u64)&stack, /* stack */> - 1 << 16, /* stack_size */> + (__aligned_u64)stack.tail, /* stack */> + (void *)stack.head - stack.tail, /* stack_size */> 0, 0, 0, 0>> This diff fixes both the second and third items. With these changes it works as expected on my system. When you're ready to share I'm interested in seeing this new programming language you're building, too.>
Thank you so much for the answer, and sorry for not replying earlier.
I don't want to sound like a total n00b (but that truth is that I am in
that case) but I tried to apply the code and cannot seem to make it
work. I added the "thread_stack" structure and modified "newthread"
to return it (and probably where I did the mistake). Then, I added the
"patch", manually and modified the "stack.head" data (and the call to
"futex_wait") as needed! I have updated the link with the source code
if you want to check it out and tell me what I did wrong (once again...).
https://opengist.thomice.li/rempas/21be0228a086440484f06dff58d886b5
Thank you so so so so so much for properly explaining things so I can
understand. Words cannot describe how much I LOVE people who do
that. If you don't have time or patience to explain, I would be very grateful
if you posted the final code and let me figure out myself what I did wrong.
Finally, you can learn about Nemesis by looking the official public repo here:
https://codeberg.org/rempas/nemesis
I'm very happy that you are interested, and I do wonder how it will look from
the perspective of an experienced low lever programmer. I was planning to
push a commit and then read and reply to this so, but this will have to wait
2-3 days as coding bugs are unpredictable, as you probably know better
than me!
You've almost got it. You just need to fix three mistakes in two lines of
newthread. The first two you can reason about by dimensional analysis.
1. "count" should be the number of stack_head objects that fit in the
stack, not thread_stack objects. I suspect this is a typo.
2. Since "count" is a number of stack_head objects, pointer arithmetic
involving it should be in terms of "stack_head *" pointers, not raw bytes
("void *"). The stack high end is the last element of a "stack_head array"
that forms the tack. (Note: And *not* "one past the end." You did not make
this mistake, but I want to highlight it. The last stack_head element is
just beyond the stack as seen by the thread, serving as a kind of metadata
much like argc+argv+envp+aux on the main thread stack.)
3. The "tail" is the low end, and as I mentioned, holds the original
mmap() return value. I left it as a "void *" because it's semantically
just some raw memory. The "head" is the high end, where the meaningful
stack_head object is situated. The stack grows from head to tail. In your
code you have the high/head and low/tail ends mixed up. Perhaps I should
have named it high and low.
Here are the above three changes in patch form, after which your program
works for me:
--- a/stack_head_clone3_futex.c
+++ b/stack_head_clone3_futex.c
@@ -134,2 +134,2 @@ static thread_stack newstack(long size) {
- long count = size / sizeof(thread_stack);
- thread_stack stack = { (void*)(p + count - 1), (void*)count };
+ long count = size / sizeof(stack_head);
+ thread_stack stack = { (void *)p, (stack_head *)p + count - 1 };