Hi,
Great article -- this was quite an interesting demo of how much can be
built from a simple primitive.
As you said, XXTEA is not a proper compression function and doesn't fit
the Merkle-Damgard assumptions, letting the attacker forge MACs. This
doesn't break the scheme fully since the attacker can't compute a
ciphertext from a plaintext without the key, but it is reduced to
unauthenticated XXTEA-CTR.
The attacker needs to see a single ciphertext and the corresponding MAC.
Let's say the ciphertext is (key || c0 || c1 || ... || cn), where ci are
32-bit blocks. The MAC is then computed as
MAC = enc(cn, enc(c[n-1], enc(..., enc(c0, HK)...)))
where HK is the inner hash state after processing the key blocks.
(this ignores the tie-off step for simplicity, but handling it only
requires an extra swap in the attack)
The attacker knows all keys ci, and can decrypt each layer one by one:
HK = dec(c0, dec(..., dec(c[n-1], dec(cn, MAC))...))
and now compute MAC(key || M) for an arbitrary message M by starting
from this value.
There are several constructions that turn block ciphers into one-way
compression functions, such as Davies-Meyer: instead of using E(in,ctx)
as the compression function, you can use E(in,ctx)^ctx. This prevents
the attacker from simply decrypting the block cipher -- an attack that
relies on properties of XXTEA might be possible, but it wouldn't be
independent of the primitive.

Thanks, Dimitrije! I've learned something new, so my exercise continues to
serve its purpose. My mistake seems really obvious after having it spelled
out. Alternative to your suggested fix, it seems HMAC would have rescued
this MAC as well. I'd have been better off skipping the "swap" trick
altogether and just relying on that.