Hi everyone,
I'm interested in attemping to implement ARM32 as a target. While I
have poked at the codebase a bit, no guarantees I could get it done.
Although if I do get it implemented then I'm willing to commit to
maintaining it as well, if necessary.
The main problem is, while 32-bit architectures in QBE are solved in
theory ("just use `w` instead of `l` for pointers"), it seems to be to
not really be solved in practice. It'd be pretty disappointing to get
the backend implemented and working, but then not be able to use it for
anything because no frontend implements the de facto incompatible 32-bit
IR. Seems like I'd have to convince every frontend to do potentially
quite a bit of work---or attempt to do it all myself---in order to
support 32-bit backends. Not only just switching out the type used for
pointers, but also possibly manually implementing narrowing for longs.
As discussed in a previous thread[1], there's the question of if longs
should be supported. It doesn't seem great to forbid `l` completely,
since supporting long integers is generally still desirable on 32-bit
architectures and it'd be a pain to expect every single frontend to
manually implement narrowing themselves (maybe not too bad if they
already have narrowing for long longs implemented though). But, if you
allow `l` (narrowing internally), then it could make it difficult to
debug a frontend if a pointer accidentally gets emitted as `l` instead
of `w`. The linked thread seemed to settle on forbidding `l` altogether
but I wouldn't feel comfortable summarily making that decision myself.
There's also the question of how (or if) to support different
architecture versions. I'm personally interested in ARMv4T and ARMv5TE
targets specifically, but they're very old and there are later revisions
that have instructions that offer performance/code size improvements
that I suspect people would want when possible. For now I'm leaning
towards just implementing the lowest common denominator, which I would
set at ARMv4T: the versions prior to that were rarely used, and have
strange behaviors that are hard to support[2].
Finally, there's the ARM Thumb/Thumb-2 compressed instruction sets that
use a subset of the opcodes, which is pretty mandatory to support IMO.
The extremely common Cortex-M microcontrollers exclusively implement
Thumb; and regardless it's common to mix ARM32 and Thumb machine code to
balance performance and code size. I can see them being treated as
completely separate instruction sets even if one is technically a subset
of the other (obviously I would try and share code internally when
possible, just exposing them as different architectures like arm64 and
arm64_apple). Also, interworking between ARM32 and Thumb relies pretty
heavily on the linker to emit "veneer" and trampoline routines for jumps
that are above a certain length, but I assume relying on that isn't an
issue.
Sorry for being so verbose and getting into specifics of the
architecture, but as I'm an outsider to the project and relatively
inexperienced with compilers I don't want to jump in and start making
decisions without discussing everything in-depth.
~nytpu
[1]: https://lists.sr.ht/~mpu/qbe/%3C20230414225658.2c577158@saphira%3E
[2]: For precedent on only supporting ARMv4 and above, see LLVM and GCC
as well as all the resources provided by ARM Holdings themselves.
--
Alex // nytpu
alex@nytpu.comhttps://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/
Hey Alex, some half-baked thoughts following on from the "8/16-bit Z80
support" thread
https://lists.sr.ht/~mpu/qbe/%3CCABdyTNG=ru7YTd32KbJYXv8xzO8iwbU8c=z4LcW2en0yhzWXDQ@mail.gmail.com%3E.
On Tue, Dec 24, 2024 at 9:52 AM Alex // nytpu <alex@nytpu.com> wrote:
> The main problem is, while 32-bit architectures in QBE are solved in> theory ("just use `w` instead of `l` for pointers"), it seems to be to> not really be solved in practice...> Seems like I'd have to convince every frontend to do potentially> quite a bit of work
Indeed.
Interestingly the QBE IL spec wanders into this territory by using the
'm' type in https://c9x.me/compile/doc/il.html#Memory for example,
where 'm' is interpreted as 'w'/'l' depending on the architecture
memory-address width.
I pondered a bit about abstracting QBE 'w'/'l' to represent
architecture-specific "integer" (C int type) width, and 'l' to be
architecture memory address (addr_t/size_t etc.) width, rather than
fixed 32-bit/64-bit assumption. I don't think this would be an
enormous change to QBE, but likely would cause some semantic confusion
where frontends use 'l' for 64-bit non-memory address arithmetic, as
you point out.
Another option is to introduce a formal QBE type 'm' to disambiguate.
> The linked thread seemed to settle on forbidding `l` altogether
This is definitely the simplest solution for 32/32-bit architectures
like ARM32/x86. I haven't looked, but it might be instructive to look
at cproc - https://github.com/michaelforney/cproc - and get an idea of
how much pain is involved in generating 'w' ops for memory addresses.
This could drive more concrete discussion on potential QBE
changes/enhancements.
> There's also the question of how (or if) to support different> architecture versions.... For now I'm leaning> towards just implementing the lowest common denominator, which I would> set at ARMv4T:
Definitely as a first pass. The current QBE approach of different
target (-t) variants will (likely) work ok, although it might cause an
explosion in the number of targets.
> Finally, there's the ARM Thumb/Thumb-2 compressed instruction sets...> ... interworking between ARM32 and Thumb relies pretty> heavily on the linker to emit "veneer" and trampoline routines for jumps> that are above a certain length, but I assume relying on that isn't an> issue.
QBE backends generate assembler, so as long as your backend generates
valid asm you can ignore all of this(?)
Hope this helps
R
On 12/24/24 00:54, Roland Paterson-Jones wrote:
> So from the cproc frontend perspective it might be easier to (still)> generate 'l' for long long on 32/32-bit, but map pointer/long types to> 32-bit ('w'), and handle 'l' lowering in the QBE backend.
I'm fine with this, I don't think it would be *that* much extra code.
One extra pass at most. My main issue is: should we also handle lowering
'd' to 's' in QBE? I'm not sure about this one since it would mean
significantly more code, since we would effectively need to rewrite
softfp in QBE IR.
Hey John, and season's greetings to anyone listening.
On Tue, Dec 24, 2024 at 7:30 PM John Nunley <dev@notgull.net> wrote:
>> On 12/24/24 00:54, Roland Paterson-Jones wrote:>> > So from the cproc frontend perspective it might be easier to (still)> > generate 'l' for long long on 32/32-bit, but map pointer/long types to> > 32-bit ('w'), and handle 'l' lowering in the QBE backend.>> I'm fine with this, I don't think it would be *that* much extra code.
Ack.
> My main issue is: should we also handle lowering> 'd' to 's' in QBE?
I'm probably missing some arch fu, but why is this an issue? x86
(non-64-bit) always supported 80-bit fp, so double-precision fp is not
obviously an issue.
R
On Tue, Dec 24, 2024 at 7:34 PM Roland Paterson-Jones
<rolandpj@gmail.com> wrote:
> I'm probably missing some arch fu, but why is this an issue? x86> (non-64-bit) always supported 80-bit fp, so double-precision fp is not> obviously an issue.
Ditto ARM32, sorry the above was too platform-specific. RiscV has
specific opt-in support for float/'s' and double/'d' precision fp. I
presume it's up to the frontend to decide how to implement this.
I do remember some ARM32 arch's supporting fp ops transparently
through instruction trapping (StrongArm?) This was a while ago tho -
previous millennium :P
>> I'm probably missing some arch fu, but why is this an issue? x86>> (non-64-bit) always supported 80-bit fp, so double-precision fp is not>> obviously an issue.> > Ditto ARM32, sorry the above was too platform-specific. RiscV has> specific opt-in support for float/'s' and double/'d' precision fp. I> presume it's up to the frontend to decide how to implement this.
Ah, if there's a way to do this that would make sense. My main concern
would be ABI issues. Not sure how easy it is to lower an "l" to two "w"
parameters.
Hey,
I guess I should say up front, an ARM32 backend materializing from me is
looking less and less likely by the day lol. I'd poked at the QBE code
before so I thought I could muddle through since I had the gist of what
most of the code does, but it turns out it is extremely difficult to
figure out how the code all fits together and *specifically* what many
functions do. And I can't easily just cargo cult from an existing
backend since 32-bit requires a number of fundamental changes. Still
persevering for the moment though.
Some things that slipped my mind in my first message:
1. Many/most ARM32 processors don't have floating-point support, I don't
know if we want to handle that somehow or just wait for the assembler
error out when assembling for a target without an FPU.
2. Many/most ARM32 processors don't have hardware dividers, I don't know
if we want to have a software divider implementation or just error out.
Probably will end up with the same solution we decide on with potential
software floating-point support as I discuss below.
On 2024-12-24 10:19AM, Roland Paterson-Jones wrote:
>I pondered a bit about abstracting QBE 'w'/'l' to represent >architecture-specific "integer" (C int type) width, and 'l' to be >architecture memory address (addr_t/size_t etc.) width, rather than >fixed 32-bit/64-bit assumption.
I don't really like the idea of not having types with specified widths
at all, feels a bit C-undefined-behavior-y in terms of benefiting the
compiler at the expense of the programmer (in this case also at the
expense of the frontend compiler).
>Another option is to introduce a formal QBE type 'm' to disambiguate.
Yeah, I've long thought that QBE really should've just had an `m` type
in the first place, the lack of which seemed to be a hypercorrection to
the type system complexity of LLVM. Perhaps not wise to introduce that
now that the IR is out in the open though. Or maybe now is the best
time since we'd be expecting frontends to change their emitted IR
regardless...
>> There's also the question of how (or if) to support different >> architecture versions...>Definitely as a first pass. The current QBE approach of different >target (-t) variants will (likely) work ok, although it might cause an >explosion in the number of targets.
Yeah, and at any rate I've been thinking that other than Thumb vs.
Thumb-2 (the latter has pretty substantial improvements), the later
revisions of the full-width ARM32 instruction set don't really provide
*that* much performance improvement and may not be worth the complexity
of supporting them in particular. Beyond some things like
floating-point support.
>> Finally, there's the ARM Thumb/Thumb-2 compressed instruction sets >> ... interworking between ARM32 and Thumb relies pretty heavily on the >> linker to emit "veneer" and trampoline routines for jumps that are >> above a certain length, but I assume relying on that isn't an issue.>QBE backends generate assembler, so as long as your backend generates>valid asm you can ignore all of this(?)
Oh yeah, it's transparent and the compiler can't really emit these
routines anyways since it has no clue how it all will get linked
together. I just didn't know if there were any plans for making QBE
work with some hypothetical simple linker where relying on complex
things like context-sensitive code emission would hamper that.
On 2024-12-24 10:54AM, Roland Paterson-Jones wrote:
>So from the cproc frontend perspective it might be easier to (still) >generate 'l' for long long on 32/32-bit, but map pointer/long types to >32-bit ('w'), and handle 'l' lowering in the QBE backend.
Yeah, honestly this seems ideal just from guessing how most frontends
would be implemented. But as alluded to in my original message, I can
already tangibly feel the pain of someone trying to debug something in a
frontend and realizing it emitted an `l` instead of a `w` for a pointer
in some edge-case.
On 2024-12-24 07:56AM, John Nunley wrote:
>My main issue is: should we also handle lowering 'd' to 's' in QBE? I'm >not sure about this one since it would mean significantly more code, >since we would effectively need to rewrite softfp in QBE IR.
I was going to note earlier in this message that I don't really think
it's the place of the compiler backend to emit calls to some software FP
library, and especially not to emit its own custom soft FP
implementation. It'd be best to just error out when encountering any
unsupported float sizes, and expect the frontend to handle transforming
them to software FP calls. Even if the frontend doesn't support that,
it's likely trivial for the programmer to just avoid doubles outright if
they're unsupported by the target, and it's unlikely the frontend would
emit any if the programmer isn't using them.
(Whereas with longs the frontend may possibly emit some even if the
programmer only uses ints, where automatic lowering would be very
convenient.)
On 2024-12-24 02:33PM, John Nunley wrote:
>Ah, if there's a way to do this that would make sense. My main concern >would be ABI issues. Not sure how easy it is to lower an "l" to two "w" >parameters.
At the very least it would be much easier for the component emitting
assembly rather than the component emitting an intermediate language.
For many situations it'd just be a matter of emitting opcodes setting or
using the carry flag. And I'd have to read the AAPCS again but IIRC
long subroutine parameters should just be passed in two consecutive
registers or on the stack in two slots, and returning a long returns the
value in r0 and r1 rather than just r0. So nothing that radical
although there's certainly stickier edge cases I'm not thinking about
(long multiplies right off the bat).
Thanks for the responses so far!
~nytpu
--
Alex // nytpu
alex@nytpu.comhttps://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/
On Wed, Dec 25, 2024, at 03:36, Alex // nytpu wrote:
> I guess I should say up front, an ARM32 backend materializing from me is > looking less and less likely by the day lol. I'd poked at the QBE code > before so I thought I could muddle through since I had the gist of what > most of the code does, but it turns out it is extremely difficult to > figure out how the code all fits together and *specifically* what many > functions do. And I can't easily just cargo cult from an existing > backend since 32-bit requires a number of fundamental changes. Still > persevering for the moment though.
I am currently spending all my qbe time integrating optimizations
by Roland. When that is done, I'd be keen on providing a simple
PoC with holes for you to plug.
> Yeah, I've long thought that QBE really should've just had an `m` type > in the first place, the lack of which seemed to be a hypercorrection to > the type system complexity of LLVM.
It isn't, an 'm' type would have unspecified size, making it
impossible to use it in a portable fashion. LLVM has pointer
types, but this is a false convenience (you have to know their
size since the bulk of the abi is on you to implement).
As a high level suggestion, I'd advise you to get something basic
going and to observe it function early. This is much more motivating
than aiming for breadth upfront. This is how qbe was written.
Quoth Quentin Carbonneaux <quentin@c9x.me>:
> On Wed, Dec 25, 2024, at 03:36, Alex // nytpu wrote:> > I guess I should say up front, an ARM32 backend materializing from me is > > looking less and less likely by the day lol. I'd poked at the QBE code > > before so I thought I could muddle through since I had the gist of what > > most of the code does, but it turns out it is extremely difficult to > > figure out how the code all fits together and *specifically* what many > > functions do. And I can't easily just cargo cult from an existing > > backend since 32-bit requires a number of fundamental changes. Still > > persevering for the moment though.> > I am currently spending all my qbe time integrating optimizations> by Roland. When that is done, I'd be keen on providing a simple> PoC with holes for you to plug.> > > Yeah, I've long thought that QBE really should've just had an `m` type > > in the first place, the lack of which seemed to be a hypercorrection to > > the type system complexity of LLVM.> > It isn't, an 'm' type would have unspecified size, making it> impossible to use it in a portable fashion. LLVM has pointer> types, but this is a false convenience (you have to know their> size since the bulk of the abi is on you to implement).> > As a high level suggestion, I'd advise you to get something basic> going and to observe it function early. This is much more motivating> than aiming for breadth upfront. This is how qbe was written.>
for a proof of concept, I'd start with simply erroring on 64 bit
types; once that's done, it's easy to experiment with how to deal
with them.
> for a proof of concept, I'd start with simply erroring on 64 bit
> types; once that's done, it's easy to experiment with how to deal
> with them.
One additional note of complexity here. A lot of QBE is predicated on
the target's word alignment being 2^3. See the "NAlign" constant in
"all.h" and the various places in code labeled with this comment:
/* specific to NAlign == 3 */
I am currently working on this, by the way, but my plan is to submit a
patch that adds an i386 backend along with the rest of this.
On 2024-12-29 04:50PM, Quentin Carbonneaux wrote:
>I am currently spending all my qbe time integrating optimizations>by Roland. When that is done, I'd be keen on providing a simple>PoC with holes for you to plug.
Oh, really! I'd appreciate it, but I don't want to saddle you with
stuff that you probably wouldn't have been doing otherwise...
>...an 'm' type would have unspecified size, making it impossible to use >it in a portable fashion. LLVM has pointer types, but this is a false >convenience (you have to know their size since the bulk of the abi is >on you to implement).
Ah
>As a high level suggestion, I'd advise you to get something basic>going and to observe it function early. This is much more motivating>than aiming for breadth upfront. This is how qbe was written.
Oh yeah, I was definitely planning this route from the start. My emails
here were mostly just to get a sense of what direction all the nuances
should end up going once the base implementation was all there.
On 2024-12-29 12:01PM, ori@eigenstate.org wrote:
>for a proof of concept, I'd start with simply erroring on 64 bit types; >once that's done, it's easy to experiment with how to deal with them.
Yep, the first thing I did was add a flag to Target indicating if it's a
32-bit architecture and forbidding 64-bit types in that case. I also
just forbid parsing anything relating to floats if the Target's nfpr is
0 (maybe not the best way and I do want to support floats for ARM32 in
the future, it's just a stopgap).
~nytpu
--
Alex // nytpu
alex@nytpu.comhttps://nytpu.com/ - gemini://nytpu.com/ - gopher://nytpu.com/
And... just to close, I'll change my mind again, for the new year :)
On Tue, Dec 24, 2024 at 10:54 AM Roland Paterson-Jones
<rolandpj@gmail.com> wrote:
>> On Tue, Dec 24, 2024 at 10:19 AM Roland Paterson-Jones> <rolandpj@gmail.com> wrote:>> > > The linked thread seemed to settle on forbidding `l` altogether> >> > This is definitely the simplest solution for 32/32-bit architectures> I took a quick look at cproc, and might change my mind now :)>> Firstly, cproc defines fixed-width types essentially assuming> 32/64-bit architecture mappings of long, long long etc.
So yes, basically getting cproc to still use "l" for C long[ long]
would be a minimal change in the (cproc) front-end, after some
contemplation I do believe that the front-end should lower to the
"machine" width.
R