~mpu/qbe

16 3

__builtin_frame_address equivalent

Details
Message ID
<b0f4548d-9a20-4453-8f73-26a6fd9eefab@gmail.com>
DKIM signature
pass
Download raw message
Consider the following C function:

```c
#include <stdio.h>

int main() {
     printf("%p\n", __builtin_frame_address(0));
     return 0;
}
```

which prints the stack frame pointer of the current function (using the 
GCC __builtin_frame_address).

This typically involves something like:

```s
.text
.balign 4
.global _get_frame_address
_get_frame_address:
     mov x0, x29
     ret
```

where x29 stores the frame pointer on aarch64.


In LLVM, this is an intrinsic, which allows it to be platform-agnostic 
(as clang also usually supports all of the gcc builtins):

```llvm
target triple = "arm64-apple-macosx13.0.0"

@.str = private unnamed_addr constant [4 x i8] c"%p\0A\00", align 1

define i32 @main() #0 {
   %1 = alloca i32, align 4
   store i32 0, i32* %1, align 4
   %2 = call i8* @llvm.frameaddress.p0i8(i32 0)
   %3 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x 
i8], [4 x i8]* @.str, i64 0, i64 0), i8* %2)
   ret i32 0
}

declare i32 @printf(i8*, ...) #1
declare i8* @llvm.frameaddress.p0i8(i32 immarg) #2
```

I doubt it, but I'll ask just in case:

Does QBE expose any way to reliably get the current stack frame pointer?

I don't particularly want to write assembly to get the pointer for each 
supported platform and link with the correct one, but if its necessary 
then whatever.

The purpose for this is to inspect the stack for roots for a GC (get the 
fp of the main function, get the fp at the collection function, scan 
between them for pointers which are in the heap, etc)

Proof of concept in Elle (which does work) to find the allocated 
pointers in the linked list of arenas from the stack, essentially 
finding the roots of allocations on the stack (not the most efficient 
but you get the idea):

```elle
use std/prelude;
external fn get_frame_address() -> void *;

fn other(void **stack_base) {
     void **stack_start = get_frame_address();

     while stack_start < stack_base { // the stack growing downwards is 
another assumption made here which isn't platform-agnostic but that can 
be easily found without any assembly
         let current = #env.allocator.current;
         let p = *stack_start;

         while current {
             if current.buffer <= p && p < current.buffer + current.used {
                 io::cprintf("discovered: %p\n", p);
             }

             current = current.next;
         }

         stack_start += 8; // for the record pointer arithmetic is not 
done by element size
     }
}

fn main() {
     void *a = #env.allocator.alloc(24);
     void *b = #env.allocator.alloc(24);
     io::cprintf("root: %p\n", a);
     io::cprintf("root: %p\n", b);
     other(get_frame_address() + 8);
}
```

where the expected output is something like:

```
root: 0x140008000
root: 0x140008018
discovered: 0x140008000
discovered: 0x140008018
```

but of course this is dependent solely on the implementation in assembly 
being linked at compile time so you must do

`cc -c -o fp.o fp.s && ellec test.le fp.o && ./test`

Any further information about whether something exists in QBE would be 
greatly appreciated.
Details
Message ID
<8b174533-9a93-4ad6-a6c6-013cb27c7dac@app.fastmail.com>
In-Reply-To
<b0f4548d-9a20-4453-8f73-26a6fd9eefab@gmail.com> (view parent)
DKIM signature
pass
Download raw message
You can rely on your C compiler and write a stub
that wraps __builtin_frame_address(1). This stub
is then easily called from qbe il.
Details
Message ID
<983139b3-7163-496d-b039-ad533869062c@gmail.com>
In-Reply-To
<b0f4548d-9a20-4453-8f73-26a6fd9eefab@gmail.com> (view parent)
DKIM signature
pass
Download raw message
Update: I wrote a somewhat automatic runtime thing which automatically 
compiles and links a tiny amount of assembly for the architecture that 
the compiler was compiled on (only aarch64 and x86_64 are supported for now)

```s
.text
.balign 4
.global ___internal_get_frame_address
___internal_get_frame_address:
     mov x0, x29
     ret
```

```s
.text
.globl	__internal_get_frame_address
.p2align	4, 0x90
.type	__internal_get_frame_address,@function
__internal_get_frame_address:
	movq	%rbp, %rax
	retq
```

This is not entirely maintainable but whatever, it works. If anyone has 
any way to make this simpler then I'd love to hear it.
Details
Message ID
<212c3798-9deb-41fd-9b9e-0d636ef8b19f@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
 >You can rely on your C compiler and write a stub
that wraps __builtin_frame_address(1). This stub
is then easily called from qbe il.

Sorry I only just saw this. That might actually be a simpler way to do 
that, thank you I think that should work.
Details
Message ID
<15cce832-d380-4732-876f-3f5cd3306356@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
Yep, this works:

```c
void *__internal_get_frame_address() {
     return __builtin_frame_address(1);
}
```

At compile time, I compile this and link with the object. I'm not sure 
why I didn't think of this lol, thank you.
Details
Message ID
<B9E2D3691DC80D8B14FC7901F90591A4@eigenstate.org>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
permerror
Download raw message
Quoth Rosie <acquitefx@gmail.com>:
> Update: I wrote a somewhat automatic runtime thing which automatically 
> compiles and links a tiny amount of assembly for the architecture that 
> the compiler was compiled on (only aarch64 and x86_64 are supported for now)
> 
> ```s
> .text
> .balign 4
> .global ___internal_get_frame_address
> ___internal_get_frame_address:
>      mov x0, x29
>      ret
> ```
> 
> ```s
> .text
> .globl	__internal_get_frame_address
> .p2align	4, 0x90
> .type	__internal_get_frame_address,@function
> __internal_get_frame_address:
> 	movq	%rbp, %rax
> 	retq
> ```
> 
> This is not entirely maintainable but whatever, it works. If anyone has 
> any way to make this simpler then I'd love to hear it.
> 

If you're willing to be somewhat conservative, and your
C compiler (or QBE) doesn't inline it, you could also
do it in portable C:

	uintptr
	__builtin_frame_address(void)
	{
		int dummy;
		return (uintptr)&dummy;
	}
Details
Message ID
<d3e49aa8-b47f-4744-89a3-489bad955cb3@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
That was my original approach because I can do it from the language 
itself so it could just be an stdlib function, but it's not as reliable 
as I'd like. It sometimes fails to find 1/2 pointers (from the example 
outlined above) while __builtin_frame_address always finds both, which 
is strange because I'd expect the portable solution to also work reliably.
Details
Message ID
<a2bc398a-2818-4e04-83f3-52de63110dad@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
I seem to have found the issue -- QBE seems to optimize variables deemed 
small enough into general purpose registers instead of pushing them to 
the stack:

```s
mov	w1, #24
bl	_GCAllocator.alloc
adrp	x0, ___internal.elle.__env__@page
add	x0, x0, ___internal.elle.__env__@pageoff
ldr	x0, [x0]
ldr	x0, [x0]
```

vs. what it should be (for the garbage collector to find it as a root)

```s
bl	_GCAllocator.alloc
add	x1, x29, #32
str	x0, [x1]
adrp	x0, ___internal.elle.__env__@page
add	x0, x0, ___internal.elle.__env__@pageoff
ldr	x0, [x0]
ldr	x0, [x0]
```

I seem to find no way to disable this optimization -- there is nothing 
in the documentation about `volatile` or `registers` and cproc doesn't 
implement volatile-annotated variables for probably the same reason.

I have a really evil idea which is to call a function with the address 
every time any stack memory is allocated to force it to be pushed to the 
stack, but I *really* don't want to resort to something so horrible.
Details
Message ID
<A9C8AFA37BC0334317E2CCFAE95E8BBC@eigenstate.org>
In-Reply-To
<a2bc398a-2818-4e04-83f3-52de63110dad@gmail.com> (view parent)
DKIM signature
permerror
Download raw message
Quoth Rosie <acquitefx@gmail.com>:
> I seem to have found the issue -- QBE seems to optimize variables deemed 
> small enough into general purpose registers instead of pushing them to 
> the stack

even if you take their address? that seems odd.
Details
Message ID
<d4ecc277-d01e-4f38-9b84-a2fd2a75ecf4@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
Yeah, it's kinda weird. This is the IR:

```qbe
function w $foo() {
@start
	%a.addr =l alloc4 8
	%tmp =l copy 0
	storel %tmp, %a.addr
	ret 0
}

```

```s
.text
.balign 4
_foo:
	hint	#34
	stp	x29, x30, [sp, -16]!
	mov	x29, sp
	mov	w0, #0
	ldp	x29, x30, [sp], 16
	ret
/* end function foo */
```

As soon as I introduce some way for it to not optimize it:

```qbe
function w $noop(l %ptr) {
@start
     ret 0
}

function w $foo() {
@start
	%a.addr =l alloc4 8
	%tmp =l copy 0
	storel %tmp, %a.addr
	call $noop(l %a.addr)
	ret 0
}
```

```s
.text
.balign 4
_noop:
	hint	#34
	stp	x29, x30, [sp, -16]!
	mov	x29, sp
	mov	w0, #0
	ldp	x29, x30, [sp], 16
	ret
/* end function noop */

.text
.balign 4
_foo:
	hint	#34
	stp	x29, x30, [sp, -32]!
	mov	x29, sp
	add	x1, x29, #24
	mov	x0, #0
	str	x0, [x1]
	add	x0, x29, #24
	bl	_noop
	mov	w0, #0
	ldp	x29, x30, [sp], 32
	ret
/* end function foo */
```

It does store the value in the stack slot. I can't tell if this is 
intentional or a bug.
Details
Message ID
<451aa6ed-c91f-46e5-a92f-6f46bf4b30be@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
I think this behavior is actually QBE just optimizing out unused stack 
allocations completely, staring at it for a while. I do wish there was a 
way to prevent this.
Details
Message ID
<44759A056CC12295FE33235C621B5567@eigenstate.org>
In-Reply-To
<451aa6ed-c91f-46e5-a92f-6f46bf4b30be@gmail.com> (view parent)
DKIM signature
permerror
Download raw message
Quoth Rosie <acquitefx@gmail.com>:
> I think this behavior is actually QBE just optimizing out unused stack 
> allocations completely, staring at it for a while. I do wish there was a 
> way to prevent this.
> 

But it is used: it gets returned. Technically, this is UB in C, but
I wonder if that's what we want in QBE IR.
Details
Message ID
<1e7e9377-1715-4116-837b-db190d61fffb@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
 > But it is used: it gets returned.

It is? but don't both functions use `mov w0, #0` ie returning 0?
Details
Message ID
<E23BA1E6B11486789C2A87C6F92C76F9@eigenstate.org>
In-Reply-To
<1e7e9377-1715-4116-837b-db190d61fffb@gmail.com> (view parent)
DKIM signature
permerror
Download raw message
Quoth Rosie <acquitefx@gmail.com>:
>  > But it is used: it gets returned.
> 
> It is? but don't both functions use `mov w0, #0` ie returning 0?
> 

my point is that this is (arguably) qbe doing not quite
the right thing.
Details
Message ID
<2c7f98d7-8175-40f5-a914-5ada8a238285@gmail.com>
In-Reply-To
<983139b3-7163-496d-b039-ad533869062c@gmail.com> (view parent)
DKIM signature
pass
Download raw message
Yeah, I suppose. From what I read, in LLVM when you do `alloca` you 
explicitly request a stack slot, so surely allocN in QBE should act the 
same way?
Details
Message ID
<24B3DAE43A09B49AD32D0BB346EFECCA@eigenstate.org>
In-Reply-To
<2c7f98d7-8175-40f5-a914-5ada8a238285@gmail.com> (view parent)
DKIM signature
permerror
Download raw message
Quoth Rosie <acquitefx@gmail.com>:
> Yeah, I suppose. From what I read, in LLVM when you do `alloca` you 
> explicitly request a stack slot, so surely allocN in QBE should act the 
> same way?
> 

yeah, though llvm also optimizes out unused stack slots. I think
that my contention is that letting the address of a stack slot
escape should count as a use.
Details
Message ID
<e04ccfee-11eb-4a7c-b845-b49fa71b270b@app.fastmail.com>
In-Reply-To
<24B3DAE43A09B49AD32D0BB346EFECCA@eigenstate.org> (view parent)
DKIM signature
pass
Download raw message
It's not so easy in qbe because the location of the frame ptr
on the stack differs between backends. On arm64 it is located
on top of the stack while on amd64 it is below locals.
Additionally, leaf functions on amd64 see their frame ptr
elided.


$ ./qbe -t amd64_sysv t.ssa
.text
.balign 16
.globl getfp
getfp:
	endbr64
	leaq 0(%rsp), %rax
	ret
.type getfp, @function
.size getfp, .-getfp
/* end function getfp */

.section .note.GNU-stack,"",@progbits
$ ./qbe -t arm64_apple t.ssa
.text
.balign 4
.globl _getfp
_getfp:
	hint	#34
	stp	x29, x30, [sp, -16]!
	mov	x29, sp
	add	x0, x29, #16
	ldp	x29, x30, [sp], 16
	ret
/* end function getfp */

$ cat t.ssa
export function l $getfp() {
@start
	%fp =l alloc4 0
	ret %fp
}
$
Reply to thread Export thread (mbox)