Hi Chris (and others),
An acquaintance nerd-sniped me yesterday by pointing to your branchless
UTF-8 decoder and asking whether a branchless *en*coder was possible.
I worked out how to do it after a bit of discussion and experimentation.
The writeup and code are at
<https://cceckman.com/writing/branchless-utf8-encoding/>.
I haven't benchmarked it -- my goal was just proof-of-concept,
not performance -- but I thought you might be interested.
Thanks for your writeup & inspiration!
-Charles
Thanks for sharing, Charles! It's interesting you included validation in
the encoder. I wouldn't have (and didn't below) in mine, but it does match
the spirit of the decoder. Mapping leading zeros onto a length is trickier
than I anticipated, and your table resolves that nicely.
To resolve your undefined bsr issue, maybe you could OR on a bit you don't
care about just for bsr. Then it's never zero, and the final length result
is unchanged. I see that done often with the GCC built-in.
You got some ideas turning in my head, and I came up with this:
https://github.com/skeeto/scratch/blob/master/misc/utf8_branchless.c
Compared to my usual encoder, the results were about what I expected, with
the branching version faster in the typical ASCII-only case, but the new
one faster if input occasionally has code points outside the ASCII range