> Access to the hardware is provided through normal files, and per-
> process namespaces do not require special permissions to modify
> mountpoints. Making a container is thus trivial: just unmount all of
> the hardware you don’t want the sandboxed program to have access to.
> Done.
I'm not a Plan 9 expert, but I'm pretty sure that's not the case. If you
check `ns`, all the files corresponding to hardware are provided from
special directories corresponding to device trees, for example `#d`.
Even if you unmount them from your namespace - the "sandboxed" program
can mount them right back, by e.g. running the commands as listed in
ns's output.
As far as I can tell, the only way to prevent that is the noattach flag
(set by RFNOMNT), which completely disables mount(). Obviously, that
breaks a lot of programs.
Even if you don't mind programs randomly breaking because you've
disabled a core feature, the
> You don't even have to be root.
part is starting to get sketchy too. Not having RFNOMNT is basically the
same as having root. Once you sandbox a program, it's no longer "root",
and it can no longer mount(), making its sandboxing capabilities much
more limited.
But again - I'm definitely not an expert on this. If I'm wrong, I'd love
to get corrected by an actual Plan 9 user/developer.
relevant functions:
namec in 9/port/chan.c
bindmount in 9/port/sysfile.c
On a similar note:
> Want to forward a TCP port? Write an implementation of /net/tcp which
> is limited to whatever ports you need — perhaps with just a hundred
> lines of shell scripting — and mount it into the namespace.
I always found that kinda weird. Why should you have to implement a
whole TCP implementation to forward a single port? Wouldn't it be more
natural for the TCP implementation to accept the address/port in the
path? For example, open("/net/listen/0.0.0.0/tcp/80")
Then you can manage all of a program's privileges as a simple list of
paths, which is simpler to reason about.
I've implemented that in my toy OS (basically Plan9 with containers) and
it seems to work fine. I'm probably missing something, though.
Hi,
> Even if you unmount them from your namespace - the "sandboxed" program
> can mount them right back, by e.g. running the commands as listed in
> ns's output.
> As far as I can tell, the only way to prevent that is the noattach flag
> (set by RFNOMNT), which completely disables mount(). Obviously, that
> breaks a lot of programs.
9front has features for more advanced sandboxing, including
auth/box, which allows specifying a full list of allowed drivers, and
constructing an arbitrary sandbox.
Moreover, quoth rfork(2):
RFNOMNT If set, subsequent mounts into the new name space
are disallowed. All pathnames starting with #
besides those used to access pipe(3), dup(3),
env(3), cons(3), and proc(3) can not be walked.
RFNOMNT doesn't break core features; if you have something already mounted,
it remains usable, and e.g. pipe (which is implemented via the #| device)
is explicitly allowed.
Accidentally replied off list initially, apologies for the double-response :/
- Noam Preil