~skeeto/public-inbox

1

Branchless UTF-8 encoder improvement

Details
Message ID
<FV3DFVEaZkRiMhweIh_rM8f92Jm1N8CEe7rk1d0HdE8RbQbxpsOnedaW6DNV27Nzo5v5LJGfCkF3V-aj-7f4JdJUdIxjEkFCRBnfUCybV60=@proton.me>
DKIM signature
pass
Download raw message
Hello Chris,

Your branchless UTF-8 decoder expects a buffer divisible by 4, so the API is not that good for the user. Could you not just have the user supply the size of the buffer, and then have a local buffer of size 4 into which you would move the bytes from the user-supplied buffer, or 0 otherwise? The move should be branchless, of course, you just have to bully the compiler into turning a ternary into a cmov. This way the API is nice and the code is still branchless.
Details
Message ID
<IiIqt7gsl8HN81XcRr5CHI99FzM44Ff1ZRbu2fKwnHTDVDCACKkQDp4mbyiTSiGgGJ3hjif8w8VIAYCAqhMWyMnPX4o0k9b2m1CnRBmgd5Q=@proton.me>
In-Reply-To
<FV3DFVEaZkRiMhweIh_rM8f92Jm1N8CEe7rk1d0HdE8RbQbxpsOnedaW6DNV27Nzo5v5LJGfCkF3V-aj-7f4JdJUdIxjEkFCRBnfUCybV60=@proton.me> (view parent)
DKIM signature
pass
Download raw message
Correction - you do not need to bully nearly as much as i thought, you can just make a zeroed buffer, and then memcpy the length of the string (or the buffer, whichever one is smallest) into it. memcpy can usually optimize into a single instruction, and the comparison is trivially turned into cmov on all compilers. What do you think?
Reply to thread Export thread (mbox)