~mpu/qbe

12 5

Implementing ARM32 Target

Details
Message ID
<arpqvfamz425yybp5e3nybxrljdpfgrmf5xya4qbk54zzviibj@27ripil2acij>
DKIM signature
pass
Download raw message
Hi everyone,

I'm interested in attemping to implement ARM32 as a target.  While I 
have poked at the codebase a bit, no guarantees I could get it done.  
Although if I do get it implemented then I'm willing to commit to 
maintaining it as well, if necessary.

The main problem is, while 32-bit architectures in QBE are solved in 
theory ("just use `w` instead of `l` for pointers"), it seems to be to 
not really be solved in practice.  It'd be pretty disappointing to get 
the backend implemented and working, but then not be able to use it for 
anything because no frontend implements the de facto incompatible 32-bit 
IR.  Seems like I'd have to convince every frontend to do potentially 
quite a bit of work---or attempt to do it all myself---in order to 
support 32-bit backends.  Not only just switching out the type used for 
pointers, but also possibly manually implementing narrowing for longs.

As discussed in a previous thread[1], there's the question of if longs 
should be supported.  It doesn't seem great to forbid `l` completely, 
since supporting long integers is generally still desirable on 32-bit 
architectures and it'd be a pain to expect every single frontend to 
manually implement narrowing themselves (maybe not too bad if they 
already have narrowing for long longs implemented though).  But, if you 
allow `l` (narrowing internally), then it could make it difficult to 
debug a frontend if a pointer accidentally gets emitted as `l` instead 
of `w`.  The linked thread seemed to settle on forbidding `l` altogether 
but I wouldn't feel comfortable summarily making that decision myself.

There's also the question of how (or if) to support different 
architecture versions.  I'm personally interested in ARMv4T and ARMv5TE 
targets specifically, but they're very old and there are later revisions 
that have instructions that offer performance/code size improvements 
that I suspect people would want when possible.  For now I'm leaning 
towards just implementing the lowest common denominator, which I would 
set at ARMv4T: the versions prior to that were rarely used, and have 
strange behaviors that are hard to support[2].

Finally, there's the ARM Thumb/Thumb-2 compressed instruction sets that 
use a subset of the opcodes, which is pretty mandatory to support IMO.  
The extremely common Cortex-M microcontrollers exclusively implement 
Thumb; and regardless it's common to mix ARM32 and Thumb machine code to 
balance performance and code size.  I can see them being treated as 
completely separate instruction sets even if one is technically a subset 
of the other (obviously I would try and share code internally when 
possible, just exposing them as different architectures like arm64 and 
arm64_apple).  Also, interworking between ARM32 and Thumb relies pretty 
heavily on the linker to emit "veneer" and trampoline routines for jumps 
that are above a certain length, but I assume relying on that isn't an 
issue.

Sorry for being so verbose and getting into specifics of the 
architecture, but as I'm an outsider to the project and relatively 
inexperienced with compilers I don't want to jump in and start making 
decisions without discussing everything in-depth.

~nytpu

[1]: https://lists.sr.ht/~mpu/qbe/%3C20230414225658.2c577158@saphira%3E
[2]: For precedent on only supporting ARMv4 and above, see LLVM and GCC 
      as well as all the resources provided by ARM Holdings themselves.

-- 
Alex // nytpu
alex@nytpu.com
https://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/
Details
Message ID
<CAAS8gYAg3YbaN14KnzzxhTPG+6pXoT=Bdt6tVLAEQhmrL+P5pA@mail.gmail.com>
In-Reply-To
<arpqvfamz425yybp5e3nybxrljdpfgrmf5xya4qbk54zzviibj@27ripil2acij> (view parent)
DKIM signature
pass
Download raw message
Hey Alex, some half-baked thoughts following on from the "8/16-bit Z80
support" thread
https://lists.sr.ht/~mpu/qbe/%3CCABdyTNG=ru7YTd32KbJYXv8xzO8iwbU8c=z4LcW2en0yhzWXDQ@mail.gmail.com%3E.

On Tue, Dec 24, 2024 at 9:52 AM Alex // nytpu <alex@nytpu.com> wrote:

> The main problem is, while 32-bit architectures in QBE are solved in
> theory ("just use `w` instead of `l` for pointers"), it seems to be to
> not really be solved in practice...
> Seems like I'd have to convince every frontend to do potentially
> quite a bit of work

Indeed.

Interestingly the QBE IL spec wanders into this territory by using the
'm' type in https://c9x.me/compile/doc/il.html#Memory for example,
where 'm' is interpreted as 'w'/'l' depending on the architecture
memory-address width.

I pondered a bit about abstracting QBE 'w'/'l' to represent
architecture-specific "integer" (C int type) width, and 'l' to be
architecture memory address (addr_t/size_t etc.) width, rather than
fixed 32-bit/64-bit assumption. I don't think this would be an
enormous change to QBE, but likely would cause some semantic confusion
where frontends use 'l' for 64-bit non-memory address arithmetic, as
you point out.

Another option is to introduce a formal QBE type 'm' to disambiguate.

> The linked thread seemed to settle on forbidding `l` altogether

This is definitely the simplest solution for 32/32-bit architectures
like ARM32/x86. I haven't looked, but it might be instructive to look
at cproc - https://github.com/michaelforney/cproc - and get an idea of
how much pain is involved in generating 'w' ops for memory addresses.
This could drive more concrete discussion on potential QBE
changes/enhancements.

> There's also the question of how (or if) to support different
> architecture versions....  For now I'm leaning
> towards just implementing the lowest common denominator, which I would
> set at ARMv4T:

Definitely as a first pass. The current QBE approach of different
target (-t) variants will (likely) work ok, although it might cause an
explosion in the number of targets.

> Finally, there's the ARM Thumb/Thumb-2 compressed instruction sets...
> ... interworking between ARM32 and Thumb relies pretty
> heavily on the linker to emit "veneer" and trampoline routines for jumps
> that are above a certain length, but I assume relying on that isn't an
> issue.

QBE backends generate assembler, so as long as your backend generates
valid asm you can ignore all of this(?)

Hope this helps
R
Details
Message ID
<CAAS8gYCGhU4JJ1d8pt9pMStUO_7h4mVuAjF3NVcJfMV0pOvV5A@mail.gmail.com>
In-Reply-To
<CAAS8gYAg3YbaN14KnzzxhTPG+6pXoT=Bdt6tVLAEQhmrL+P5pA@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Tue, Dec 24, 2024 at 10:19 AM Roland Paterson-Jones
<rolandpj@gmail.com> wrote:

> > The linked thread seemed to settle on forbidding `l` altogether
>
> This is definitely the simplest solution for 32/32-bit architectures
> like ARM32/x86. I haven't looked, but it might be instructive to look
> at cproc - https://github.com/michaelforney/cproc - and get an idea of
> how much pain is involved in generating 'w' ops for memory addresses.
> This could drive more concrete discussion on potential QBE
> changes/enhancements.

I took a quick look at cproc, and might change my mind now :)

Firstly, cproc defines fixed-width types essentially assuming
32/64-bit architecture mappings of long, long long etc.

https://github.com/michaelforney/cproc/blob/master/type.c#L28-L35

... then explicitly treats pointers as "typeulong"/size 8:

https://github.com/michaelforney/cproc/blob/master/qbe.c#L265-L268

https://github.com/michaelforney/cproc/blob/master/qbe.c#L460-L461

https://github.com/michaelforney/cproc/blob/master/type.c#L68

etc.

So from the cproc frontend perspective it might be easier to (still)
generate 'l' for long long on 32/32-bit, but map pointer/long types to
32-bit ('w'), and handle 'l' lowering in the QBE backend.

R
Details
Message ID
<2bd9bab5-4e21-4bc9-a48d-4e5c40e47385@notgull.net>
In-Reply-To
<CAAS8gYCGhU4JJ1d8pt9pMStUO_7h4mVuAjF3NVcJfMV0pOvV5A@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 12/24/24 00:54, Roland Paterson-Jones wrote:

> So from the cproc frontend perspective it might be easier to (still)
> generate 'l' for long long on 32/32-bit, but map pointer/long types to
> 32-bit ('w'), and handle 'l' lowering in the QBE backend.

I'm fine with this, I don't think it would be *that* much extra code.
One extra pass at most. My main issue is: should we also handle lowering
'd' to 's' in QBE? I'm not sure about this one since it would mean
significantly more code, since we would effectively need to rewrite
softfp in QBE IR.
Details
Message ID
<CAAS8gYDMMicFFMwFqPaU3gDbnV4CxweRg-EJ39DtjBNRdz=+zw@mail.gmail.com>
In-Reply-To
<2bd9bab5-4e21-4bc9-a48d-4e5c40e47385@notgull.net> (view parent)
DKIM signature
pass
Download raw message
Hey John, and season's greetings to anyone listening.

On Tue, Dec 24, 2024 at 7:30 PM John Nunley <dev@notgull.net> wrote:
>
> On 12/24/24 00:54, Roland Paterson-Jones wrote:
>
> > So from the cproc frontend perspective it might be easier to (still)
> > generate 'l' for long long on 32/32-bit, but map pointer/long types to
> > 32-bit ('w'), and handle 'l' lowering in the QBE backend.
>
> I'm fine with this, I don't think it would be *that* much extra code.

Ack.

> My main issue is: should we also handle lowering
> 'd' to 's' in QBE?

I'm probably missing some arch fu, but why is this an issue? x86
(non-64-bit) always supported 80-bit fp, so double-precision fp is not
obviously an issue.

R
Details
Message ID
<CAAS8gYBk1mVhE=7HM8gbRomRT7C_XhNkuvkC7xxvdbga5hGFMA@mail.gmail.com>
In-Reply-To
<CAAS8gYDMMicFFMwFqPaU3gDbnV4CxweRg-EJ39DtjBNRdz=+zw@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
On Tue, Dec 24, 2024 at 7:34 PM Roland Paterson-Jones
<rolandpj@gmail.com> wrote:

> I'm probably missing some arch fu, but why is this an issue? x86
> (non-64-bit) always supported 80-bit fp, so double-precision fp is not
> obviously an issue.

Ditto ARM32, sorry the above was too platform-specific. RiscV has
specific opt-in support for float/'s' and double/'d' precision fp. I
presume it's up to the frontend to decide how to implement this.

I do remember some ARM32 arch's supporting fp ops transparently
through instruction trapping (StrongArm?) This was a while ago tho -
previous millennium :P
Details
Message ID
<c6e786aa-2a67-4886-8080-bf71600ef501@notgull.net>
In-Reply-To
<CAAS8gYBk1mVhE=7HM8gbRomRT7C_XhNkuvkC7xxvdbga5hGFMA@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
>> I'm probably missing some arch fu, but why is this an issue? x86
>> (non-64-bit) always supported 80-bit fp, so double-precision fp is not
>> obviously an issue.
> 
> Ditto ARM32, sorry the above was too platform-specific. RiscV has
> specific opt-in support for float/'s' and double/'d' precision fp. I
> presume it's up to the frontend to decide how to implement this.

Ah, if there's a way to do this that would make sense. My main concern
would be ABI issues. Not sure how easy it is to lower an "l" to two "w"
parameters.
Details
Message ID
<3jwxmhm3vdgaxtsbcqygyaplce5cbhkayn3dnesrfjfj24svc7@a4fdnodlmm3u>
In-Reply-To
<CAAS8gYCGhU4JJ1d8pt9pMStUO_7h4mVuAjF3NVcJfMV0pOvV5A@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
Hey,

I guess I should say up front, an ARM32 backend materializing from me is 
looking less and less likely by the day lol.  I'd poked at the QBE code 
before so I thought I could muddle through since I had the gist of what 
most of the code does, but it turns out it is extremely difficult to 
figure out how the code all fits together and *specifically* what many 
functions do.  And I can't easily just cargo cult from an existing 
backend since 32-bit requires a number of fundamental changes. Still 
persevering for the moment though.

Some things that slipped my mind in my first message:
1. Many/most ARM32 processors don't have floating-point support, I don't 
know if we want to handle that somehow or just wait for the assembler 
error out when assembling for a target without an FPU.

2. Many/most ARM32 processors don't have hardware dividers, I don't know 
if we want to have a software divider implementation or just error out.  
Probably will end up with the same solution we decide on with potential 
software floating-point support as I discuss below.


On 2024-12-24 10:19AM, Roland Paterson-Jones wrote:
>I pondered a bit about abstracting QBE 'w'/'l' to represent 
>architecture-specific "integer" (C int type) width, and 'l' to be 
>architecture memory address (addr_t/size_t etc.) width, rather than 
>fixed 32-bit/64-bit assumption.
I don't really like the idea of not having types with specified widths 
at all, feels a bit C-undefined-behavior-y in terms of benefiting the 
compiler at the expense of the programmer (in this case also at the 
expense of the frontend compiler).

>Another option is to introduce a formal QBE type 'm' to disambiguate.
Yeah, I've long thought that QBE really should've just had an `m` type 
in the first place, the lack of which seemed to be a hypercorrection to 
the type system complexity of LLVM.  Perhaps not wise to introduce that 
now that the IR is out in the open though.  Or maybe now is the best 
time since we'd be expecting frontends to change their emitted IR 
regardless...

>> There's also the question of how (or if) to support different 
>> architecture versions...
>Definitely as a first pass. The current QBE approach of different 
>target (-t) variants will (likely) work ok, although it might cause an 
>explosion in the number of targets.
Yeah, and at any rate I've been thinking that other than Thumb vs. 
Thumb-2 (the latter has pretty substantial improvements), the later 
revisions of the full-width ARM32 instruction set don't really provide 
*that* much performance improvement and may not be worth the complexity 
of supporting them in particular.  Beyond some things like 
floating-point support.

>> Finally, there's the ARM Thumb/Thumb-2 compressed instruction sets 
>> ... interworking between ARM32 and Thumb relies pretty heavily on the 
>> linker to emit "veneer" and trampoline routines for jumps that are 
>> above a certain length, but I assume relying on that isn't an issue.
>QBE backends generate assembler, so as long as your backend generates
>valid asm you can ignore all of this(?)
Oh yeah, it's transparent and the compiler can't really emit these 
routines anyways since it has no clue how it all will get linked 
together.  I just didn't know if there were any plans for making QBE 
work with some hypothetical simple linker where relying on complex 
things like context-sensitive code emission would hamper that.


On 2024-12-24 10:54AM, Roland Paterson-Jones wrote:
>So from the cproc frontend perspective it might be easier to (still) 
>generate 'l' for long long on 32/32-bit, but map pointer/long types to 
>32-bit ('w'), and handle 'l' lowering in the QBE backend.
Yeah, honestly this seems ideal just from guessing how most frontends 
would be implemented.  But as alluded to in my original message, I can 
already tangibly feel the pain of someone trying to debug something in a 
frontend and realizing it emitted an `l` instead of a `w` for a pointer 
in some edge-case.


On 2024-12-24 07:56AM, John Nunley wrote:
>My main issue is: should we also handle lowering 'd' to 's' in QBE? I'm 
>not sure about this one since it would mean significantly more code, 
>since we would effectively need to rewrite softfp in QBE IR.
I was going to note earlier in this message that I don't really think 
it's the place of the compiler backend to emit calls to some software FP 
library, and especially not to emit its own custom soft FP 
implementation.  It'd be best to just error out when encountering any 
unsupported float sizes, and expect the frontend to handle transforming 
them to software FP calls.  Even if the frontend doesn't support that, 
it's likely trivial for the programmer to just avoid doubles outright if 
they're unsupported by the target, and it's unlikely the frontend would 
emit any if the programmer isn't using them.

(Whereas with longs the frontend may possibly emit some even if the 
programmer only uses ints, where automatic lowering would be very 
convenient.)


On 2024-12-24 02:33PM, John Nunley wrote:
>Ah, if there's a way to do this that would make sense. My main concern 
>would be ABI issues. Not sure how easy it is to lower an "l" to two "w" 
>parameters.
At the very least it would be much easier for the component emitting 
assembly rather than the component emitting an intermediate language.  
For many situations it'd just be a matter of emitting opcodes setting or 
using the carry flag.  And I'd have to read the AAPCS again but IIRC 
long subroutine parameters should just be passed in two consecutive 
registers or on the stack in two slots, and returning a long returns the 
value in r0 and r1 rather than just r0.  So nothing that radical 
although there's certainly stickier edge cases I'm not thinking about 
(long multiplies right off the bat).

Thanks for the responses so far!
~nytpu

-- 
Alex // nytpu
alex@nytpu.com
https://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/
Details
Message ID
<db495191-b69c-4a42-9c1e-08e476a51343@app.fastmail.com>
In-Reply-To
<3jwxmhm3vdgaxtsbcqygyaplce5cbhkayn3dnesrfjfj24svc7@a4fdnodlmm3u> (view parent)
DKIM signature
pass
Download raw message
On Wed, Dec 25, 2024, at 03:36, Alex // nytpu wrote:
> I guess I should say up front, an ARM32 backend materializing from me is 
> looking less and less likely by the day lol.  I'd poked at the QBE code 
> before so I thought I could muddle through since I had the gist of what 
> most of the code does, but it turns out it is extremely difficult to 
> figure out how the code all fits together and *specifically* what many 
> functions do.  And I can't easily just cargo cult from an existing 
> backend since 32-bit requires a number of fundamental changes. Still 
> persevering for the moment though.

I am currently spending all my qbe time integrating optimizations
by Roland. When that is done, I'd be keen on providing a simple
PoC with holes for you to plug.

> Yeah, I've long thought that QBE really should've just had an `m` type 
> in the first place, the lack of which seemed to be a hypercorrection to 
> the type system complexity of LLVM.

It isn't, an 'm' type would have unspecified size, making it
impossible to use it in a portable fashion. LLVM has pointer
types, but this is a false convenience (you have to know their
size since the bulk of the abi is on you to implement).

As a high level suggestion, I'd advise you to get something basic
going and to observe it function early. This is much more motivating
than aiming for breadth upfront. This is how qbe was written.
Details
Message ID
<E22435FC1219D9D7EB4F6C40A273E225@eigenstate.org>
In-Reply-To
<db495191-b69c-4a42-9c1e-08e476a51343@app.fastmail.com> (view parent)
DKIM signature
permerror
Download raw message
Quoth Quentin Carbonneaux <quentin@c9x.me>:
> On Wed, Dec 25, 2024, at 03:36, Alex // nytpu wrote:
> > I guess I should say up front, an ARM32 backend materializing from me is 
> > looking less and less likely by the day lol.  I'd poked at the QBE code 
> > before so I thought I could muddle through since I had the gist of what 
> > most of the code does, but it turns out it is extremely difficult to 
> > figure out how the code all fits together and *specifically* what many 
> > functions do.  And I can't easily just cargo cult from an existing 
> > backend since 32-bit requires a number of fundamental changes. Still 
> > persevering for the moment though.
> 
> I am currently spending all my qbe time integrating optimizations
> by Roland. When that is done, I'd be keen on providing a simple
> PoC with holes for you to plug.
> 
> > Yeah, I've long thought that QBE really should've just had an `m` type 
> > in the first place, the lack of which seemed to be a hypercorrection to 
> > the type system complexity of LLVM.
> 
> It isn't, an 'm' type would have unspecified size, making it
> impossible to use it in a portable fashion. LLVM has pointer
> types, but this is a false convenience (you have to know their
> size since the bulk of the abi is on you to implement).
> 
> As a high level suggestion, I'd advise you to get something basic
> going and to observe it function early. This is much more motivating
> than aiming for breadth upfront. This is how qbe was written.
> 

for a proof of concept, I'd start with simply erroring on 64 bit
types; once that's done, it's easy to experiment with how to deal
with them.
Details
Message ID
<c019b58f-aef9-4b92-82dc-c297c6fe2351@notgull.net>
In-Reply-To
<E22435FC1219D9D7EB4F6C40A273E225@eigenstate.org> (view parent)
DKIM signature
missing
Download raw message
 > for a proof of concept, I'd start with simply erroring on 64 bit
 > types; once that's done, it's easy to experiment with how to deal
 > with them.

One additional note of complexity here. A lot of QBE is predicated on
the target's word alignment being 2^3. See the "NAlign" constant in
"all.h" and the various places in code labeled with this comment:

     /* specific to NAlign == 3 */

I am currently working on this, by the way, but my plan is to submit a
patch that adds an i386 backend along with the rest of this.
Details
Message ID
<sohynxdzodejblwl3gtyvec7oiy2x3ndya33j3uott7elzkda6@7laktaffilqz>
In-Reply-To
<db495191-b69c-4a42-9c1e-08e476a51343@app.fastmail.com> (view parent)
DKIM signature
pass
Download raw message
On 2024-12-29 04:50PM, Quentin Carbonneaux wrote:
>I am currently spending all my qbe time integrating optimizations
>by Roland. When that is done, I'd be keen on providing a simple
>PoC with holes for you to plug.
Oh, really!  I'd appreciate it, but I don't want to saddle you with 
stuff that you probably wouldn't have been doing otherwise...

>...an 'm' type would have unspecified size, making it impossible to use 
>it in a portable fashion. LLVM has pointer types, but this is a false 
>convenience (you have to know their size since the bulk of the abi is 
>on you to implement).
Ah

>As a high level suggestion, I'd advise you to get something basic
>going and to observe it function early. This is much more motivating
>than aiming for breadth upfront. This is how qbe was written.
Oh yeah, I was definitely planning this route from the start.  My emails 
here were mostly just to get a sense of what direction all the nuances 
should end up going once the base implementation was all there.


On 2024-12-29 12:01PM, ori@eigenstate.org wrote:
>for a proof of concept, I'd start with simply erroring on 64 bit types; 
>once that's done, it's easy to experiment with how to deal with them.
Yep, the first thing I did was add a flag to Target indicating if it's a 
32-bit architecture and forbidding 64-bit types in that case.  I also 
just forbid parsing anything relating to floats if the Target's nfpr is 
0 (maybe not the best way and I do want to support floats for ARM32 in 
the future, it's just a stopgap).

~nytpu

-- 
Alex // nytpu
alex@nytpu.com
https://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/
Details
Message ID
<CAAS8gYDf17HqW+m6PEjHavX17v4e5oLVPWVV-GDry5dbcbLDOA@mail.gmail.com>
In-Reply-To
<CAAS8gYCGhU4JJ1d8pt9pMStUO_7h4mVuAjF3NVcJfMV0pOvV5A@mail.gmail.com> (view parent)
DKIM signature
pass
Download raw message
And... just to close, I'll change my mind again, for the new year :)

On Tue, Dec 24, 2024 at 10:54 AM Roland Paterson-Jones
<rolandpj@gmail.com> wrote:
>
> On Tue, Dec 24, 2024 at 10:19 AM Roland Paterson-Jones
> <rolandpj@gmail.com> wrote:
>
> > > The linked thread seemed to settle on forbidding `l` altogether
> >
> > This is definitely the simplest solution for 32/32-bit architectures

> I took a quick look at cproc, and might change my mind now :)
>
> Firstly, cproc defines fixed-width types essentially assuming
> 32/64-bit architecture mappings of long, long long etc.

So yes, basically getting cproc to still use "l" for C long[ long]
would be a minimal change in the (cproc) front-end, after some
contemplation I do believe that the front-end should lower to the
"machine" width.

R
Reply to thread Export thread (mbox)