Authentication-Results: mail-b.sr.ht; dkim=none Received: from mail.nullprogram.com (mail.nullprogram.com [192.241.191.137]) by mail-b.sr.ht (Postfix) with ESMTPS id 2ACCF11EF53 for <~skeeto/public-inbox@lists.sr.ht>; Wed, 29 Dec 2021 17:11:03 +0000 (UTC) Received: from nullprogram.com (localhost [127.0.0.1]) by mail.nullprogram.com (Postfix) with ESMTPS id 8DD02C77FE; Wed, 29 Dec 2021 12:11:02 -0500 (EST) Date: Wed, 29 Dec 2021 12:11:01 -0500 From: Christopher Wellons To: Ron Yorston Cc: ~skeeto/public-inbox@lists.sr.ht Subject: busybox-w32 and UTF-8 Message-ID: <20211229171101.l5doxlbr5jysfy53@nullprogram.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline User-Agent: NeoMutt/20170113 (1.7.2) Hey, Ron. I recently learned about a relatively new Windows feature to set the "Active Code Page" for processes to UTF-8 when they're started, which causes most ANSI Win32 encode Unicode strings as UTF-8, notably including GetCommandLineA() and CreateFileA(). Since every Windows C and C++ runtime hooks up the to the ANSI API, this means C and C++ programs are upgraded to support Unicode the same way they do on every other platform, and Windows doesn't require so much special handling. They get UTF-8-encoded arguments and can access Unicode paths via UTF-8. Even busybox-w32 can take advantage of this, at least to some extent. In case you're interested, I elaborated on it in this issue: https://github.com/skeeto/w64devkit/issues/15 Unfortunately, this does not cover ReadConsoleA() nor ReadFile() on a console.