~sircmpwn/hare-dev

hare.ini proof-of-concept v1 NEEDS REVISION

Drew DeVault: 1
 hare.ini proof-of-concept

 9 files changed, 164 insertions(+), 8 deletions(-)
Again, I think wordexp is the obvious solution here.
wordexp doesn't really help here though, at least not with libs:cmd (or
libs:sh I guess). I think it makes sense to just use the shell for this,
but since the command is actually being executed by the shell and not
just split, there's not really any reason to use wordexp on its own.

I think splitting libs up strikes a good balance: it's possible to run
commands in the system shell if necessary, but the shell isn't used
universally. IMO the shell is the right tool for the job when the job
is, executing commands. Outside of that context, the word splitting
semantics of the shell just aren't desirable, since there's other better
options. I say the other options are better because it's still not
possible to correctly implement wordexp, assuming wordexp just uses the
system shell (which I agree is the best approach). The best that can be
done are janky hacks like in musl.
the other nice thing about libs:sh is that it could be generalized to
all keys, rather than just being for libs

one other thing that occurs to me: libs (without :sh) does still want to
be split, likely with shlex, since this:

libs=-lfoo -lbar

wants to be passed as two args to ld. the output of the shell command
would also need to be shlexed, so this would probably just be handled in
the logic for [link]libs - *:sh could be converted to s/:sh// elsewhere
in the code
so, the full algorithm is:
- for each key in the hare.ini file
  - if the key ends in ":sh", get rid of the :sh on the key and replace
    the value with sh -c "$value". this could also be :cmd, don't have
    strong opinions
  - otherwise, if the key ends in ":env" and we decide that doing this
    is a good idea, get rid of the :env and replace the value with
    ${value}
  - look up the key + section and run the relevant function for it (or
    error out on unrecognized keys)
    - for [link]libs, shlex the value and append the results to the link
      array on the relevant module
    - for tags, call merge_tags(&ctx.ctx.tags, value)
    - probably other things, but these are the two keys i can think of
      off the top of my head
It sounds like a lot of hacks to avoid what wordexp is designed to do
for fairly arbitrary reasons. IMO wordexp is a spot-on tool for this
use-case.
How are they hacks?
libs:sh=echo "foo bar" baz

This outputs `foo bar baz` and the quoting is lost. It gets run through
shlex and the quotes are lost, so it's split into [foo, bar, baz]. If
you want it to be [foo bar, baz] you'd need the less intuitive

libs:sh=echo '"foo bar"' baz

Footguns abound.

Or we could use wordexp, which just does the right thing.
For the use cases in hare.ini I can't think of any time that this would
matter though. Are spaces ever needed for libs?
I'm not sure shlex should be used at all here tbh. If we can make the
assumption that spaces are always used as field separators then there's
not much gained from any approach besides splitting whitespace.
Ah, true, fair enough. I'm still not a fan of wordexp (and would still
like to hear alternative solutions tbh) but I see how it solves problems
like this.

With regard to wordexp: I'd be much more open to it if I could see it
being implemented in Hare in a way that isn't a complete hack. And maybe
that isn't possible; maybe musl's implementation is really the best that
can be done. But that's really my main concern about wordexp in general.

If we put wordexp in the stdlib I also think we should deviate from
POSIX and not support flags like NOCMD, since it provides a false sense
of security, and using wordexp to read untrusted user input is always a
bug.
if we do end up going with wordexp, i'd really prefer for it to be
purely internal to the code that handles hare.ini, since it's not a good
interface to encourage using
Can you guys explain why wordexp bothers you so?
I'd be fine with wordexp being a user-facing stdlib module if we end up
using it for hare.ini. My main problems with wordexp as an interface are
actually problems with POSIX sh, but these issues don't matter as much
if wordexp just uses the system shell.

I think that most of the time wordexp isn't the right tool for the job,
since shell expansion is often just a bad idea, and a footgun in the
case of poorly-designed shells like POSIX sh. But I suppose it sometimes
can be useful, so I'm not blanketly opposed to exporting it.
If we don't end up using wordexp for hare.ini (which, again, I'd still
like to see alternatives, just because I dislike shell syntax and its
associated footguns), then I don't think it should be in the stdlib.


the issue i have with wordexp is that it's trying to build an idiomatic
hare interface to the shell over the sh(1p) cli, which fundamentally
doesn't provide a comprehensive enough interface to be able to reliably
and maintainably implement wordexp

"pipe this through sh -c" + "use this literal text" isn't quite as
elegant to the end-user, sure, but i don't want to have to maintain a
wordexp implementation in the stdlib

(i've also never been a huge fan of having wordexp on the roadmap, and
i'd be perfectly fine updating the roadmap to remove it from there)
I mean, it's not that bad, right? Have you seen my latest patch for it?

Export patchset (mbox)
How do I use this?

Copy & paste the following snippet into your terminal to import this patchset into git:

curl -s https://lists.sr.ht/~sircmpwn/hare-dev/patches/45143/mbox | git am -3
Learn more about email & git

[RFC PATCH] hare.ini proof-of-concept Export this patch

---
This is a basic proof of concept for how simple hare.ini support might
start shaping up. This is sufficient to automatically link to system
libraries when importing bindings, which is one of the use-cases for
which hare.ini was proposed:

	$ cat pixman/hare.ini 
	[link]
	libs=$(pkg-config --libs --static pixman-1)
	$ cat main.ha 
	use fmt;
	use pixman;
	use types::c;

	export fn main() void = {
		const ver = pixman::version_string();
		const ver = c::tostr(ver)!;
		fmt::printfln("pixman version {}", ver)!;
	};
	$ hare run main.ha 
	pixman version 0.42.2

 cmd/hare/build.ha       | 46 +++++++++++++++++++--
 cmd/hare/build/ini.ha   | 89 +++++++++++++++++++++++++++++++++++++++++
 cmd/hare/build/types.ha |  2 +-
 cmd/hare/build/util.ha  |  1 -
 hare/module/srcs.ha     | 16 ++++++++
 hare/module/types.ha    |  6 +++
 scripts/gen-stdlib      |  3 +-
 sort/search.ha          |  3 ++
 stdlib.mk               |  6 ++-
 9 files changed, 164 insertions(+), 8 deletions(-)
 create mode 100644 cmd/hare/build/ini.ha

diff --git a/cmd/hare/build.ha b/cmd/hare/build.ha
index fa55878d..a4b51dfe 100644
--- a/cmd/hare/build.ha
+++ b/cmd/hare/build.ha
@@ -76,7 +76,7 @@ fn build(name: str, cmd: *getopt::command) (void | error) = {
		case 'L' =>
			append(ctx.libdirs, opt.1);
		case 'l' =>
			append(ctx.libs, opt.1);
			append(ctx.libs, fmt::asprintf("-l{}", opt.1));
		case 'N' =>
			ast::ident_free(ctx.ns);
			ctx.ns = [];
@@ -126,6 +126,8 @@ fn build(name: str, cmd: *getopt::command) (void | error) = {
		os::exit(os::status::FAILURE);
	};

	const input = if (len(cmd.args) == 0) os::getcwd() else cmd.args[0];

	ctx.arch = arch.qbe_name;
	ctx.cmds = ["",
		os::tryenv("HAREC", "harec"),
@@ -134,14 +136,52 @@ fn build(name: str, cmd: *getopt::command) (void | error) = {
		os::tryenv("LD", arch.ld_cmd),
	];
	set_arch_tags(&ctx.ctx.tags, arch);

	let libc = false;
	if (len(ctx.libs) > 0) {
		libc = true;
		merge_tags(&ctx.ctx.tags, "+libc")?;
		ctx.cmds[build::stage::BIN] = os::tryenv("CC", arch.cc_cmd);
	};

	const input = if (len(cmd.args) == 0) os::getcwd() else cmd.args[0];

	ctx.mods = build::gather(&ctx, os::realpath(input)?)?;
	for (let i = 0z; i < len(ctx.mods); i += 1) {
		const ini = ctx.mods[i].srcs.ini;
		if (ini != "") {
			build::applyconfig(&ctx, &ctx.mods[i], ini)?;
		};
	};

	// If hare.ini added any external libraries, we have to re-gather
	// modules with +libc
	//
	// XXX: This is a bit of a hack. The main issue here is that the list of
	// dependencies might change between +libc and -libc, and if so, maybe
	// we will eliminate all of the hare.ini files which switched on +libc
	// in the first place. Attempting to resolve this recursively is
	// basically impossible so just forget about it and stick with linking
	// to libc if at any point a configuration is produced which would link
	// to libc.
	//
	// Possible solutions for the broader set of hacks:
	//
	// 1. We want to get rid of tag-based file selection and move to
	//    hare.ini for configuring file selection. This problem might be
	//    reduced at that point.
	// 2. We could use a separate pass for finding hare.ini files, before we
	//    gather the rest of the sources, then add +libc before doing the
	//    second step.
	//
	// Both of these solutions would be cleaner, but we might still end up
	// with the problem that the dependency tree changes once +libc is
	// turned on.
	if (len(ctx.libs) > 0 && !libc) {
		merge_tags(&ctx.ctx.tags, "+libc")?;
		ctx.cmds[build::stage::BIN] = os::tryenv("CC", arch.cc_cmd);
		module::free_slice(ctx.mods);
		ctx.mods = build::gather(&ctx, os::realpath(input)?)?;
	};

	append(ctx.hashes, [[void...]...], len(ctx.mods));

	let built = build::execute(&ctx)?;
diff --git a/cmd/hare/build/ini.ha b/cmd/hare/build/ini.ha
new file mode 100644
index 00000000..98cc0d6c
--- /dev/null
+++ b/cmd/hare/build/ini.ha
@@ -0,0 +1,89 @@
use encoding::utf8;
use fmt;
use format::ini;
use fs;
use hare::module;
use io;
use os;
use os::exec;
use sort;
use strings;
use wordexp;

// Applies configuration from a hare.ini file for a given module to the build
// context.
export fn applyconfig(
	ctx: *context,
	mod: *module::module,
	path: str,
) (void | error) = {
	const in = os::open(path)?;
	defer io::close(in)!;
	const scan = ini::scan(in);
	defer ini::finish(&scan);

	for (true) {
		const (sec, key, val) = match (ini::next(&scan)) {
		case let entry: ini::entry =>
			yield entry;
		case io::EOF =>
			break;
		case =>
			return module::invalid_ini;
		};

		const func = match (sort::search(
			conf_sections,
			size((str, *opaque)),
			&sec, &conf_cmp,
		)) {
		case void =>
			continue;
		case let i: size =>
			yield conf_sections[i].1;
		};
		func(ctx, key, val)?;
	};
};

// sorted
const conf_sections = [
	("link", &conf_link),
];

fn conf_link(ctx: *context, key: str, value: str) (void | error) = {
	const func = match (sort::search(
		conf_link_keys,
		size((str, *opaque)),
		&key, &conf_cmp,
	)) {
	case void =>
		return;
	case let i: size =>
		yield conf_link_keys[i].1;
	};
	func(ctx, value)?;
};

// sorted
const conf_link_keys = [
	("libs", &conf_link_libs),
];

fn conf_link_libs(ctx: *context, value: str) (void | error) = {
	match (wordexp::wordexp(value, wordexp::flag::SHOWERR)) {
	case let err: (io::error | exec::error) =>
		return err;
	case (utf8::invalid | wordexp::we_error) =>
		return module::invalid_ini;
	case let words: []str =>
		append(ctx.libs, words...);
		free(words);
	};
};

fn conf_cmp(a: const *opaque, b: const *opaque) int = {
	const key = a: const *str;
	const val = b: const *(str, *opaque);
	return strings::compare(*key, val.0);
};
diff --git a/cmd/hare/build/types.ha b/cmd/hare/build/types.ha
index 34461b6a..b436d436 100644
--- a/cmd/hare/build/types.ha
+++ b/cmd/hare/build/types.ha
@@ -94,7 +94,7 @@ export fn ctx_finish(ctx: *context) void = {
	};
	free(ctx.defines);
	free(ctx.libdirs);
	free(ctx.libs);
	strings::freeall(ctx.libs);
	ast::ident_free(ctx.ns);
	free(ctx.version);
	module::free_slice(ctx.mods);
diff --git a/cmd/hare/build/util.ha b/cmd/hare/build/util.ha
index f421699c..673d1e3b 100644
--- a/cmd/hare/build/util.ha
+++ b/cmd/hare/build/util.ha
@@ -239,7 +239,6 @@ fn get_args(ctx: *context, tmp: str, flags: []str, t: *task) []str = {
			append(args, strings::dup("-Wl,--no-gc-sections"));
		};
		for (let i = 0z; i < len(ctx.libs); i += 1) {
			append(args, strings::dup("-l"));
			append(args, strings::dup(ctx.libs[i]));
		};
		yield []: []str;
diff --git a/hare/module/srcs.ha b/hare/module/srcs.ha
index 07e093a1..ed2bd52f 100644
--- a/hare/module/srcs.ha
+++ b/hare/module/srcs.ha
@@ -27,6 +27,8 @@ export type srcset = struct {
	// source files. These are sorted alphabetically, and are the set of
	// tags that should be used to find this module in the cache.
	seentags: []str,
	// hare.ini, if present, or ""
	ini: str,
	// hare source files (.ha)
	ha: []str,
	// assembly source files (.s)
@@ -45,6 +47,7 @@ export fn finish_srcset(srcs: *srcset) void = {
	strings::freeall(srcs.s);
	strings::freeall(srcs.o);
	strings::freeall(srcs.sc);
	free(srcs.ini);
};

// Find the on-disk path and set of source files for a given module. The path is
@@ -117,6 +120,9 @@ fn path_find(ctx: *context, buf: *path::buffer) (srcset | error) = {
			yield &res.o;
		case "sc" =>
			yield &res.sc;
		case "ini" =>
			res.ini = srcs[i].3[0];
			continue;
		case => abort();
		};
		append(out, srcs[i].3[0]);
@@ -203,6 +209,16 @@ fn _findsrcs(
	};

	if (fs::isfile(stat.mode)) {
		if (path::peek(buf) as str == "hare.ini") {
			append(srcs, (
				strings::dup("hare"),
				strings::dup("ini"),
				0,
				alloc([strings::dup(path::string(buf))]),
			));
			return;
		};

		let ext = match (path::pop_ext(buf)) {
		case void =>
			return;
diff --git a/hare/module/types.ha b/hare/module/types.ha
index c7e3bbbd..17fc9d66 100644
--- a/hare/module/types.ha
+++ b/hare/module/types.ha
@@ -17,6 +17,9 @@ export type tag_has_dot = !void;
// Generic badly formatted tag error.
export type tag_bad_format = !void;

// An invalid hare.ini file is present
export type invalid_ini = !void;

// A dependency cycle error.
export type dep_cycle = ![]str;

@@ -42,6 +45,7 @@ export type error = !(
	tag_has_dot |
	tag_bad_format |
	errcontext |
	invalid_ini |
);

// A container struct for context, used by [[gather]].
@@ -124,6 +128,8 @@ fn _strerror(e: error, buf: *memio::stream) void = {
		memio::concat(buf, ctx.0, ": ")!;
		_strerror(*ctx.1, buf);
		return;
	case invalid_ini =>
		yield "Invalid hare.ini file";
	};
	memio::concat(buf, s)!;
};
diff --git a/scripts/gen-stdlib b/scripts/gen-stdlib
index 2382b56c..79f1aa0c 100755
--- a/scripts/gen-stdlib
+++ b/scripts/gen-stdlib
@@ -210,12 +210,13 @@ bytes() {
cmd_hare_build() {
	gen_srcs cmd::hare::build \
		gather.ha \
		ini.ha \
		queue.ha \
		types.ha \
		util.ha
	gen_ssa cmd::hare::build encoding::hex crypto::sha256 errors fmt fs \
		hare::ast hare::module hare::unparse hash io memio os os::exec path \
		sort strings shlex unix::tty
		sort strings shlex unix::tty format::ini wordexp
}

crypto() {
diff --git a/sort/search.ha b/sort/search.ha
index 5ae78ffd..c31f97a4 100644
--- a/sort/search.ha
+++ b/sort/search.ha
@@ -3,6 +3,9 @@

// Performs a binary search over a sorted slice. If the key is found, index of
// the matching item in the slice is returned. Otherwise, void is returned.
//
// The compare function is called with the key as the first argument, and the
// value to test against in the second argument.
export fn search(
	in: []const opaque,
	sz: size,
diff --git a/stdlib.mk b/stdlib.mk
index f5ddf29b..d864c663 100644
--- a/stdlib.mk
+++ b/stdlib.mk
@@ -968,11 +968,12 @@ $(HARECACHE)/bytes/bytes-any.ssa: $(stdlib_bytes_any_srcs) $(stdlib_rt) $(stdlib
# cmd::hare::build (+any)
stdlib_cmd_hare_build_any_srcs = \
	$(STDLIB)/cmd/hare/build/gather.ha \
	$(STDLIB)/cmd/hare/build/ini.ha \
	$(STDLIB)/cmd/hare/build/queue.ha \
	$(STDLIB)/cmd/hare/build/types.ha \
	$(STDLIB)/cmd/hare/build/util.ha

$(HARECACHE)/cmd/hare/build/cmd_hare_build-any.ssa: $(stdlib_cmd_hare_build_any_srcs) $(stdlib_rt) $(stdlib_encoding_hex_$(PLATFORM)) $(stdlib_crypto_sha256_$(PLATFORM)) $(stdlib_errors_$(PLATFORM)) $(stdlib_fmt_$(PLATFORM)) $(stdlib_fs_$(PLATFORM)) $(stdlib_hare_ast_$(PLATFORM)) $(stdlib_hare_module_$(PLATFORM)) $(stdlib_hare_unparse_$(PLATFORM)) $(stdlib_hash_$(PLATFORM)) $(stdlib_io_$(PLATFORM)) $(stdlib_memio_$(PLATFORM)) $(stdlib_os_$(PLATFORM)) $(stdlib_os_exec_$(PLATFORM)) $(stdlib_path_$(PLATFORM)) $(stdlib_sort_$(PLATFORM)) $(stdlib_strings_$(PLATFORM)) $(stdlib_shlex_$(PLATFORM)) $(stdlib_unix_tty_$(PLATFORM))
$(HARECACHE)/cmd/hare/build/cmd_hare_build-any.ssa: $(stdlib_cmd_hare_build_any_srcs) $(stdlib_rt) $(stdlib_encoding_hex_$(PLATFORM)) $(stdlib_crypto_sha256_$(PLATFORM)) $(stdlib_errors_$(PLATFORM)) $(stdlib_fmt_$(PLATFORM)) $(stdlib_fs_$(PLATFORM)) $(stdlib_hare_ast_$(PLATFORM)) $(stdlib_hare_module_$(PLATFORM)) $(stdlib_hare_unparse_$(PLATFORM)) $(stdlib_hash_$(PLATFORM)) $(stdlib_io_$(PLATFORM)) $(stdlib_memio_$(PLATFORM)) $(stdlib_os_$(PLATFORM)) $(stdlib_os_exec_$(PLATFORM)) $(stdlib_path_$(PLATFORM)) $(stdlib_sort_$(PLATFORM)) $(stdlib_strings_$(PLATFORM)) $(stdlib_shlex_$(PLATFORM)) $(stdlib_unix_tty_$(PLATFORM)) $(stdlib_format_ini_$(PLATFORM)) $(stdlib_wordexp_$(PLATFORM))
	@printf 'HAREC \t$@\n'
	@mkdir -p $(HARECACHE)/cmd/hare/build
	@$(stdlib_env) $(HAREC) $(HARECFLAGS) -o $@ -Ncmd::hare::build \
@@ -3452,11 +3453,12 @@ $(TESTCACHE)/bytes/bytes-any.ssa: $(testlib_bytes_any_srcs) $(testlib_rt) $(test
# cmd::hare::build (+any)
testlib_cmd_hare_build_any_srcs = \
	$(STDLIB)/cmd/hare/build/gather.ha \
	$(STDLIB)/cmd/hare/build/ini.ha \
	$(STDLIB)/cmd/hare/build/queue.ha \
	$(STDLIB)/cmd/hare/build/types.ha \
	$(STDLIB)/cmd/hare/build/util.ha

$(TESTCACHE)/cmd/hare/build/cmd_hare_build-any.ssa: $(testlib_cmd_hare_build_any_srcs) $(testlib_rt) $(testlib_encoding_hex_$(PLATFORM)) $(testlib_crypto_sha256_$(PLATFORM)) $(testlib_errors_$(PLATFORM)) $(testlib_fmt_$(PLATFORM)) $(testlib_fs_$(PLATFORM)) $(testlib_hare_ast_$(PLATFORM)) $(testlib_hare_module_$(PLATFORM)) $(testlib_hare_unparse_$(PLATFORM)) $(testlib_hash_$(PLATFORM)) $(testlib_io_$(PLATFORM)) $(testlib_memio_$(PLATFORM)) $(testlib_os_$(PLATFORM)) $(testlib_os_exec_$(PLATFORM)) $(testlib_path_$(PLATFORM)) $(testlib_sort_$(PLATFORM)) $(testlib_strings_$(PLATFORM)) $(testlib_shlex_$(PLATFORM)) $(testlib_unix_tty_$(PLATFORM))
$(TESTCACHE)/cmd/hare/build/cmd_hare_build-any.ssa: $(testlib_cmd_hare_build_any_srcs) $(testlib_rt) $(testlib_encoding_hex_$(PLATFORM)) $(testlib_crypto_sha256_$(PLATFORM)) $(testlib_errors_$(PLATFORM)) $(testlib_fmt_$(PLATFORM)) $(testlib_fs_$(PLATFORM)) $(testlib_hare_ast_$(PLATFORM)) $(testlib_hare_module_$(PLATFORM)) $(testlib_hare_unparse_$(PLATFORM)) $(testlib_hash_$(PLATFORM)) $(testlib_io_$(PLATFORM)) $(testlib_memio_$(PLATFORM)) $(testlib_os_$(PLATFORM)) $(testlib_os_exec_$(PLATFORM)) $(testlib_path_$(PLATFORM)) $(testlib_sort_$(PLATFORM)) $(testlib_strings_$(PLATFORM)) $(testlib_shlex_$(PLATFORM)) $(testlib_unix_tty_$(PLATFORM)) $(testlib_format_ini_$(PLATFORM)) $(testlib_wordexp_$(PLATFORM))
	@printf 'HAREC \t$@\n'
	@mkdir -p $(TESTCACHE)/cmd/hare/build
	@$(testlib_env) $(HAREC) $(TESTHARECFLAGS) -o $@ -Ncmd::hare::build \
-- 
2.42.0
i like the idea of hare.ini as a whole, but i think this needs more work

i think we should also have a global tags= directive rather than
implicitly adding +libc when libs are specified:

tags=+libc

[link]
libcmd=pkg-config foo

which'll allow us to do [link+tag1+tag2] without having weirdness
because of the possibility of eg. [link-libc]

this should be processed in hare::module, allowing us to avoid rerunning
dependency resolution in cmd/hare. [foo+bar+baz] should also be handled
in hare::module, but i don't have strong opinions on whether [link]
should be handled either in hare::module (adding a link: str field to
modules) or in cmd/hare

i think we should also remove -l, since hare.ini obsoletes it, and we
should also get rid of the special case for README in favor of requiring
a hare.ini when there're no hare sources

lastly, i think we can avoid wordexp:

tags=+libc

[link]
# or maybe lib=-lfoo? open to either. note that this wouldn't get
# shlexed, in order to require the use of multiple lib= directives for
# multiple libs. i'd also be fine having a libs= which requires -l and
# does shlex stuff, but i kinda prefer lib= for consistency with libcmd=
# (which does need to be singular because that's how we avoid
# reimplementing a shell parser)
lib=foo

# but the output of this command would be
libcmd=pkg-config bar

# idk if we actually want this, but it's an option
libvar=BAZ

i really really would rather not have a wordexp in the stdlib, and this
design avoids creating our own special alternate format while still
allowing everything we want to be possible here. it should also be
pretty easy to implement
-%<-