Hi, something that isn't stressed in the article is that
SetConsoleOutputCP() changes global state. Which means that if your
program does it, then yes, it might correctly process UTF-8 with libc
procedures, but once it's done your console will be broken! A simple example:
1. Open w64devkit
2. Verify that current active code page is IBM437:
$ chcp
Active code page: 437
3. Create and enter a directory with a Unicode character in its name:
$ mkdir tortüga
$ cd tortüga
4. Create a sample file main.c:
#include <stdio.h>
typedef signed int b32;
typedef unsigned int u32;
enum {
IBM437 = 437,
CP_UTF8 = 65001,
};
#define W32(r) __declspec(dllimport) r __stdcall
W32(b32) SetConsoleOutputCP(u32);
int main(int argc, char *argv[])
{
fprintf(stderr, "USAGE: %s <args>\n", argv[0]);
/* SetConsoleOutputCP(IBM437); */
return 0;
}
4. Obtain libwinsane.o and copy it to the directory.
5. Compile: gcc main.c libwinsane.o
6. Exit directory and launch the binary:
$ cd ..
$ ./tortüga/a.exe
USAGE: ./tortüga/a.exe <args>
7. As you can see it printed the character correctly, however:
$ cd tort�ga
$ pwd
C:/Users/aragnir/code/tort�ga
8. If you uncomment the line that sets the code page back to IBM437 then there
is no lingering side effect. No way I'm doing that at every exit point in my
program!
On a side note, if you compile libwinsane.o with clang (from llvm-mingw), but
link with ld (not lld), then compiling the whole thing with gcc produces a
malformed binary:
$ cd ~/code/skeeto_scratch/libwinsane
$ make CC=x86_64-w64-mingw32-clang
x86_64-w64-mingw32-clang -Os -g -Wall -Wextra -c -o init.o init.c
x86_64-w64-mingw32-windres -o manifest.o manifest.rc
x86_64-w64-mingw32-ld -relocatable -o libwinsane.o init.o manifest.o
$ cd ../../torüga
$ cp ../skeeto_scratch/libwinsane/libwinsane.o .
$ gcc main.c libwinsane.o
C:/Users/aragnir/code/shared/w64devkit/bin/ld.exe: a.exe:/4: section below image base
$ ./a.exe
sh: ./a.exe: Exec format error
So it seems that gcc and clang don't cooperate here.
Thanks, Pavel, and good point! I've added a note to my article. It's been
three years, and I never made significant use of libwinsane, in large part
because of this issue. It's an unsatisfactory solution in general, though
convenient for a quick port.
Some good news: Your example will no longer demonstrate the problem in the
next x64 w64devkit release. I've enabled unicode in 64-bit builds, and so
shell behavior no longer depends on the console code page. Though most
other included software still does, so you'd only need to change your
example slightly. The fundamental problem doesn't change.
> compiling the whole thing with gcc produces a malformed binary
That's not too surprising, particularly with windres involved. If it's
LLVM versus Binutils, generally it's a Bintuils bug, so bfd rather than
LLVM windres. That's where I'd look first. This year I observed a similar
incompatibility between Binutils import libraries and MSVC link.exe:
https://github.com/skeeto/w64devkit/issues/135
It seems mixing and matching toolchains doesn't produce robust results
unless there's a hard module boundary mediating them. I bet hardly anyone
is doing this, so it doesn't get noticed and fixed.