This significantly improves parsing performance for massive functions
with a huge number of temporaries. Parsing the 86MiB IL produced
by cproc during zig bootstrap drops from 17m15s to 2.5s (over 400x
speedup).
The speedup is much smaller for IL produced from normal non-autogenerated
C code. Parsing the sqlite3 amalgamation drops from 0.40s to 0.33s.
---
I reused the TMask constant for the initial size of the hashtable,
but perhaps now it should have a different name?
parse.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/parse.c b/parse.c
index a909b7d..0154ca4 100644
--- a/parse.c+++ b/parse.c
@@ -152,7 +152,8 @@ static struct {
static int lnum;
static Fn *curf;
-static int tmph[TMask+1];+static int *tmph;+static int tmphcap;static Phi **plink;
static Blk *curb;
static Blk **blink;
@@ -384,19 +385,26 @@ expect(int t)
static Ref
tmpref(char *v)
{
- int t, *h;-- h = &tmph[hash(v) & TMask];- t = *h;- if (t) {+ int t, i;++ if (tmphcap/2 <= curf->ntmp-Tmp0) {+ free(tmph);+ tmphcap = tmphcap ? tmphcap*2 : TMask+1;+ tmph = emalloc(tmphcap * sizeof *tmph);+ for (t=Tmp0; t<curf->ntmp; t++) {+ i = hash(curf->tmp[t].name) & (tmphcap-1);+ for (; tmph[i]; i=(i+1) & (tmphcap-1))+ ;+ tmph[i] = t;+ }+ }+ for (i=hash(v) & (tmphcap-1); tmph[i]; i=(i+1) & (tmphcap-1)) {+ t = tmph[i]; if (strcmp(curf->tmp[t].name, v) == 0)
return TMP(t);
- for (t=curf->ntmp-1; t>=Tmp0; t--)- if (strcmp(curf->tmp[t].name, v) == 0)- return TMP(t); }
t = curf->ntmp;
- *h = t;+ tmph[i] = t; newtmp(0, Kx, curf);
strcpy(curf->tmp[t].name, v);
return TMP(t);
@@ -926,7 +934,7 @@ parsefn(Lnk *lnk)
b->dlink = 0; /* was trashed by findblk() */
for (i=0; i<BMask+1; ++i)
blkh[i] = 0;
- memset(tmph, 0, sizeof tmph);+ memset(tmph, 0, tmphcap * sizeof *tmph); typecheck(curf);
return curf;
}
--
2.44.0
On Sat, Apr 13, 2024, at 12:28, Michael Forney wrote:
> This significantly improves parsing performance for massive functions> with a huge number of temporaries.
Thanks a lot, I merged your patch in master.
> during zig bootstrap drops from 17m15s to 2.5s (over 400x> speedup).
Is there a place where I can read more about this line of work?
Quentin Carbonneaux <quentin@c9x.me> wrote:
> > during zig bootstrap drops from 17m15s to 2.5s (over 400x
> > speedup).
>
> Is there a place where I can read more about this line of work?
I just tried it since andrewrk mentioned it in #musl.
To try yourself, you can clone the zig repository, then
cproc -o bootstrap bootstrap.c
CC=cproc ./bootstrap
I also needed a small patch to prevent the generated C from loading
an uninitialized variably in a dead code path (slot %.174 is read
but never stored to).
It seems that everything is able to compile except for one function
`f17`, which gives
rega.c: dying: cannot have more moves than registers
I haven't yet tried to minimize the function yet.
diff --git a/stage1/wasm2c.c b/stage1/wasm2c.c
index 10c0a0c3de..942ac40e85 100644
--- a/stage1/wasm2c.c+++ b/stage1/wasm2c.c
@@ -552,7 +552,7 @@ int main(int argc, char **argv) {
FuncGen_indent(&fg, out);
(void)FuncGen_localDeclare(&fg, out,
func_type->result->types[result_i]);
- fputs(";\n", out);+ fputs(" = 0;\n", out); }
FuncGen_blockBegin(&fg, out, WasmOpcode_block, funcs[func_i].type_idx);
while (!FuncGen_done(&fg)) {
Michael Forney <mforney@mforney.org> wrote:
> It seems that everything is able to compile except for one function> `f17`, which gives> > rega.c: dying: cannot have more moves than registers> > I haven't yet tried to minimize the function yet.
The result after creduce+ddmin is attached.