Authentication-Results: mail-b.sr.ht; dkim=pass header.d=triptico.com header.i=@triptico.com Received: from relay5-d.mail.gandi.net (relay5-d.mail.gandi.net [217.70.183.197]) by mail-b.sr.ht (Postfix) with ESMTPS id 1066611EF43 for <~skeeto/public-inbox@lists.sr.ht>; Mon, 7 Mar 2022 08:55:09 +0000 (UTC) Received: (Authenticated sender: outgoing@comam.es) by mail.gandi.net (Postfix) with ESMTPSA id 995F71C0014 for <~skeeto/public-inbox@lists.sr.ht>; Mon, 7 Mar 2022 08:55:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=triptico.com; s=gm1; t=1646643308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Hdjl4gWg7u6F6IBFJvEtaWUmEfrb2Ax9QUUOLAAnrrY=; b=ppow2Q7pa4siT2DAvatv8T26En88KSQAiwHtt8DoynTuyfao527cHK/xMtXpP4m3CAmL0o yfYMtas7k58veeDCrSwobGmuVwQTms8bkfLazwld+VOcLkblUz/C08MihZ9AmJGypgpx82 XTuMq2OJZg8sq2YjMvIzJ9C8zi4ptKmdLRUf0U5Gfi/jNCtX8JJQycSHA85fUmBzs8J5Lw CsBWjUW0PbjJvumtNWKk9JvEoIaJmvMg+cjsGIrBSe7Xuj6bbhsKRrF224DyhOHelG541Q bt/Q/f3MJAFZ1kGoahMu3eXYKHFgQRamIPOHgaP+ChO/lgwHa5n6f8wJ/zlNjA== Received: (nullmailer pid 1293050 invoked by uid 1000); Mon, 07 Mar 2022 08:55:07 -0000 Date: Mon, 7 Mar 2022 09:55:07 +0100 From: Angel Ortega To: ~skeeto/public-inbox@lists.sr.ht Subject: Re: Compressing and embedding a Wordle word list Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Organization: triptico.com X-Secret: -.-- --- ..- .-.. --- --- -.- -... --- .-. . -.. I see another approach; given the assumption that the first letter always happen, it's not necessary to store it in the table at all, only to increment by one from the first letter of the previously seen word when the second character of the word has a lower value than the one in the previously seen word. If the word[5] buffer contains the previous word, this is easily achieved. Given this, we can store each word in 20 bytes, and using a similar way to index the database by bit, we will need a total of 12672 words x 5 letters x 20 bits / 8 bytes = 17741 bytes. I agree that the decoder will be a bit more complicated (but not that much) and not as elegant. Best regards, Ángel