Consider the following C function:
```c
#include <stdio.h>
int main() {
printf("%p\n", __builtin_frame_address(0));
return 0;
}
```
which prints the stack frame pointer of the current function (using the
GCC __builtin_frame_address).
This typically involves something like:
```s
.text
.balign 4
.global _get_frame_address
_get_frame_address:
mov x0, x29
ret
```
where x29 stores the frame pointer on aarch64.
In LLVM, this is an intrinsic, which allows it to be platform-agnostic
(as clang also usually supports all of the gcc builtins):
```llvm
target triple = "arm64-apple-macosx13.0.0"
@.str = private unnamed_addr constant [4 x i8] c"%p\0A\00", align 1
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, i32* %1, align 4
%2 = call i8* @llvm.frameaddress.p0i8(i32 0)
%3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x
i8], [4 x i8]* @.str, i64 0, i64 0), i8* %2)
ret i32 0
}
declare i32 @printf(i8*, ...) #1
declare i8* @llvm.frameaddress.p0i8(i32 immarg) #2
```
I doubt it, but I'll ask just in case:
Does QBE expose any way to reliably get the current stack frame pointer?
I don't particularly want to write assembly to get the pointer for each
supported platform and link with the correct one, but if its necessary
then whatever.
The purpose for this is to inspect the stack for roots for a GC (get the
fp of the main function, get the fp at the collection function, scan
between them for pointers which are in the heap, etc)
Proof of concept in Elle (which does work) to find the allocated
pointers in the linked list of arenas from the stack, essentially
finding the roots of allocations on the stack (not the most efficient
but you get the idea):
```elle
use std/prelude;
external fn get_frame_address() -> void *;
fn other(void **stack_base) {
void **stack_start = get_frame_address();
while stack_start < stack_base { // the stack growing downwards is
another assumption made here which isn't platform-agnostic but that can
be easily found without any assembly
let current = #env.allocator.current;
let p = *stack_start;
while current {
if current.buffer <= p && p < current.buffer + current.used {
io::cprintf("discovered: %p\n", p);
}
current = current.next;
}
stack_start += 8; // for the record pointer arithmetic is not
done by element size
}
}
fn main() {
void *a = #env.allocator.alloc(24);
void *b = #env.allocator.alloc(24);
io::cprintf("root: %p\n", a);
io::cprintf("root: %p\n", b);
other(get_frame_address() + 8);
}
```
where the expected output is something like:
```
root: 0x140008000
root: 0x140008018
discovered: 0x140008000
discovered: 0x140008018
```
but of course this is dependent solely on the implementation in assembly
being linked at compile time so you must do
`cc -c -o fp.o fp.s && ellec test.le fp.o && ./test`
Any further information about whether something exists in QBE would be
greatly appreciated.
Update: I wrote a somewhat automatic runtime thing which automatically
compiles and links a tiny amount of assembly for the architecture that
the compiler was compiled on (only aarch64 and x86_64 are supported for now)
```s
.text
.balign 4
.global ___internal_get_frame_address
___internal_get_frame_address:
mov x0, x29
ret
```
```s
.text
.globl __internal_get_frame_address
.p2align 4, 0x90
.type __internal_get_frame_address,@function
__internal_get_frame_address:
movq %rbp, %rax
retq
```
This is not entirely maintainable but whatever, it works. If anyone has
any way to make this simpler then I'd love to hear it.
>You can rely on your C compiler and write a stub
that wraps __builtin_frame_address(1). This stub
is then easily called from qbe il.
Sorry I only just saw this. That might actually be a simpler way to do
that, thank you I think that should work.
Yep, this works:
```c
void *__internal_get_frame_address() {
return __builtin_frame_address(1);
}
```
At compile time, I compile this and link with the object. I'm not sure
why I didn't think of this lol, thank you.
Quoth Rosie <acquitefx@gmail.com>:
> Update: I wrote a somewhat automatic runtime thing which automatically > compiles and links a tiny amount of assembly for the architecture that > the compiler was compiled on (only aarch64 and x86_64 are supported for now)> > ```s> .text> .balign 4> .global ___internal_get_frame_address> ___internal_get_frame_address:> mov x0, x29> ret> ```> > ```s> .text> .globl __internal_get_frame_address> .p2align 4, 0x90> .type __internal_get_frame_address,@function> __internal_get_frame_address:> movq %rbp, %rax> retq> ```> > This is not entirely maintainable but whatever, it works. If anyone has > any way to make this simpler then I'd love to hear it.>
If you're willing to be somewhat conservative, and your
C compiler (or QBE) doesn't inline it, you could also
do it in portable C:
uintptr
__builtin_frame_address(void)
{
int dummy;
return (uintptr)&dummy;
}
That was my original approach because I can do it from the language
itself so it could just be an stdlib function, but it's not as reliable
as I'd like. It sometimes fails to find 1/2 pointers (from the example
outlined above) while __builtin_frame_address always finds both, which
is strange because I'd expect the portable solution to also work reliably.
I seem to have found the issue -- QBE seems to optimize variables deemed
small enough into general purpose registers instead of pushing them to
the stack:
```s
mov w1, #24
bl _GCAllocator.alloc
adrp x0, ___internal.elle.__env__@page
add x0, x0, ___internal.elle.__env__@pageoff
ldr x0, [x0]
ldr x0, [x0]
```
vs. what it should be (for the garbage collector to find it as a root)
```s
bl _GCAllocator.alloc
add x1, x29, #32
str x0, [x1]
adrp x0, ___internal.elle.__env__@page
add x0, x0, ___internal.elle.__env__@pageoff
ldr x0, [x0]
ldr x0, [x0]
```
I seem to find no way to disable this optimization -- there is nothing
in the documentation about `volatile` or `registers` and cproc doesn't
implement volatile-annotated variables for probably the same reason.
I have a really evil idea which is to call a function with the address
every time any stack memory is allocated to force it to be pushed to the
stack, but I *really* don't want to resort to something so horrible.
Quoth Rosie <acquitefx@gmail.com>:
> I seem to have found the issue -- QBE seems to optimize variables deemed > small enough into general purpose registers instead of pushing them to > the stack
even if you take their address? that seems odd.
Yeah, it's kinda weird. This is the IR:
```qbe
function w $foo() {
@start
%a.addr =l alloc4 8
%tmp =l copy 0
storel %tmp, %a.addr
ret 0
}
```
```s
.text
.balign 4
_foo:
hint #34
stp x29, x30, [sp, -16]!
mov x29, sp
mov w0, #0
ldp x29, x30, [sp], 16
ret
/* end function foo */
```
As soon as I introduce some way for it to not optimize it:
```qbe
function w $noop(l %ptr) {
@start
ret 0
}
function w $foo() {
@start
%a.addr =l alloc4 8
%tmp =l copy 0
storel %tmp, %a.addr
call $noop(l %a.addr)
ret 0
}
```
```s
.text
.balign 4
_noop:
hint #34
stp x29, x30, [sp, -16]!
mov x29, sp
mov w0, #0
ldp x29, x30, [sp], 16
ret
/* end function noop */
.text
.balign 4
_foo:
hint #34
stp x29, x30, [sp, -32]!
mov x29, sp
add x1, x29, #24
mov x0, #0
str x0, [x1]
add x0, x29, #24
bl _noop
mov w0, #0
ldp x29, x30, [sp], 32
ret
/* end function foo */
```
It does store the value in the stack slot. I can't tell if this is
intentional or a bug.
I think this behavior is actually QBE just optimizing out unused stack
allocations completely, staring at it for a while. I do wish there was a
way to prevent this.
Quoth Rosie <acquitefx@gmail.com>:
> I think this behavior is actually QBE just optimizing out unused stack > allocations completely, staring at it for a while. I do wish there was a > way to prevent this.>
But it is used: it gets returned. Technically, this is UB in C, but
I wonder if that's what we want in QBE IR.
Quoth Rosie <acquitefx@gmail.com>:
> > But it is used: it gets returned.> > It is? but don't both functions use `mov w0, #0` ie returning 0?>
my point is that this is (arguably) qbe doing not quite
the right thing.
Quoth Rosie <acquitefx@gmail.com>:
> Yeah, I suppose. From what I read, in LLVM when you do `alloca` you > explicitly request a stack slot, so surely allocN in QBE should act the > same way?>
yeah, though llvm also optimizes out unused stack slots. I think
that my contention is that letting the address of a stack slot
escape should count as a use.
It's not so easy in qbe because the location of the frame ptr
on the stack differs between backends. On arm64 it is located
on top of the stack while on amd64 it is below locals.
Additionally, leaf functions on amd64 see their frame ptr
elided.
$ ./qbe -t amd64_sysv t.ssa
.text
.balign 16
.globl getfp
getfp:
endbr64
leaq 0(%rsp), %rax
ret
.type getfp, @function
.size getfp, .-getfp
/* end function getfp */
.section .note.GNU-stack,"",@progbits
$ ./qbe -t arm64_apple t.ssa
.text
.balign 4
.globl _getfp
_getfp:
hint #34
stp x29, x30, [sp, -16]!
mov x29, sp
add x0, x29, #16
ldp x29, x30, [sp], 16
ret
/* end function getfp */
$ cat t.ssa
export function l $getfp() {
@start
%fp =l alloc4 0
ret %fp
}
$