~skeeto/public-inbox

2 2

Re: A new protocol and tool for PNG file attachments

Details
Message ID
<f37ea068-3648-77b0-2c5d-4984b9d58c66@tuxpup.com>
DKIM signature
pass
Download raw message
I enjoyed this piece, and I've wanted a similar thing before.

 > I expect any experienced programmer could write a basic attachment 
extractor in their language of choice inside of 30 or so minutes. 
Hooking up a DEFLATE library for decompression would be the most 
difficult part.

That held up well for me. Here's my 15ish minute attempt in python:

https://git.sr.ht/~tuxpup/pypng-detach/tree/main/item/pypng_detach/cli.py

Reading a null-terminated string to get the name feels a little 
contorted in python, but compression was very straightforward.

Thanks for posting a fun experiment.

Re: A new protocol and tool for PNG file attachments

Details
Message ID
<20220102220508.5lk4ddm5wgsh63gb@nullprogram.com>
In-Reply-To
<f37ea068-3648-77b0-2c5d-4984b9d58c66@tuxpup.com> (view parent)
DKIM signature
missing
Download raw message
Nice! Your independent implementation also validates my own. After looking 
at your code, I have some suggestions if you're interested:

* Instead of a racy exists() with "wb", you can use "xb" in a try-except 
with FileExistsError, which will perform the check atomically (i.e. via 
open(2) O_EXCL).

* You don't need decompressobj, which is for streaming. In your case you 
can just use the higher-level zlib.decompress() in place of the entire 
helper function.

* If the attachment chunk is length zero, you get an IndexError. If it 
otherwise doesn't contain a null byte, you treat the whole chunk as the 
file name, and you use the zeroth byte as the compression flag. None of 
this matters for valid inputs, but you may want to more gracefully handle 
invalid inputs. I'd use .find() or .index() on the chunk data to search 
for the null byte index, and skip the chunk if none is found. I'd also 
ensure there's at least one byte following the null byte to avoid either 
reading the wrong byte or an index error.

Re: A new protocol and tool for PNG file attachments

Details
Message ID
<16e59e45-0ed2-9e14-ffd5-64912950ef5f@tuxpup.com>
In-Reply-To
<20220102220508.5lk4ddm5wgsh63gb@nullprogram.com> (view parent)
DKIM signature
pass
Download raw message
Thanks for the feedback! I appreciate the suggestions, and feel like I 
should add a couple of observations that you were too kind to make 
directly just in case someone stumbles over this in the archives and 
considers using it.

On 1/2/22 17:05, Christopher Wellons wrote:
> * Instead of a racy exists() with "wb", you can use "xb" in a try-except 
> with FileExistsError, which will perform the check atomically (i.e. via 
> open(2) O_EXCL).

Thanks very much for pointing out "xb". I never noticed the addition of 
the x mode flag, and I've been moved over to python 3 for far to long to 
have any excuse for that.

> 
> * You don't need decompressobj, which is for streaming. In your case you 
> can just use the higher-level zlib.decompress() in place of the entire 
> helper function.
> 

Very good point. I was looking at some of my own old code when I did 
this rather than zlib docs. That code was streaming, but this'll never 
need to be.

> * If the attachment chunk is length zero, you get an IndexError. If it 
> otherwise doesn't contain a null byte, you treat the whole chunk as the 
> file name, and you use the zeroth byte as the compression flag. None of 
> this matters for valid inputs, but you may want to more gracefully 
> handle invalid inputs. I'd use .find() or .index() on the chunk data to 
> search for the null byte index, and skip the chunk if none is found. I'd 
> also ensure there's at least one byte following the null byte to avoid 
> either reading the wrong byte or an index error.

This is all correct. Like my racy existence guardrail, everything in 
this first attempt is only good enough to process pngs I've produced or 
manually inspected.

Besides what you point out here, I also did not perform any of the 
sanity checks on the filename that you describe, like making sure it 
doesn't start with a dot or contain path delimiters.

The code in my repo definitely shouldn't be used for anything other than 
light experimentation until everything mentioned above is addressed.

I think I'm interested enough to add an encoder that can deflate 
payloads, in addition to making my decoding more robust. I'll post a 
link here when I do.

(I should also add: I used only your blog post and the green block png 
with the compressed attachment as a reference. I didn't click through to 
your implementation until just a minute ago. That speaks favorably to 
the clarity of your explanation. Most of the time for specs adjacent to 
this, I don't get close without significant interaction with the 
reference implementation.)
Reply to thread Export thread (mbox)