Seeking within files is a very useful operation, especially in a context where you can only have 2 files open at a time.
The tricky part is trying to fit seeking operations into the existing device. we could use the unused file vector for this, but we could also reuse `success` for this, as currently it is never written to.
being able to use `append` to change the seek mode would also be nice (0 = seek from start of file, 1 = seek from current position, 2 = seek from end of file).
after seeking, `success` can be read to get (the bottom 16 bits of) the seek position. having relative seeking allows seeking to positions greater than 2^16.
the offset (written to `success`) would be interpreted as a signed 16 bit integer (this allows you to seek backwards).
trying to seek in a file that is not seekable would have no effect, and the seek offset would be unchanged (but still written to `success`). this can be used to check if a file is seekable.
platforms that cannot easily implement seeking can simply always return a seek offset of 0.
the exact details of how this would work with writing (especially with append set) are unclear.
of course, there's also the option of reworking the file device completely, if this seems too messy.
- binarycat
File device and the debugger [was Re: Seeking the File device]
Hi all,
On Sat, Jun 04, 2022 at 02:21:48PM -0000, binarycat@envs.net wrote:
>Seeking within files is a very useful operation, especially in a >context where you can only have 2 files open at a time.>>The tricky part is trying to fit seeking operations into the existing >device. we could use the unused file vector for this, but we could >also reuse `success` for this, as currently it is never written to.
As Devine said in their reply, we used to have a File device that was
able to seek. Its ports were:
2 × vector (unused)
2 × success
4 × offset
2 × address
2 × length
2 × read (trigger)
2 × write (trigger)
It was totally full of ports already, but it had its own charm. Having a
32-bit offset port (the only 32-bit port in Varvara, period) meant that
accessing files over 64k was no problem, and you had random access.
Truncating or appending files on write was also simple: if the offset
was zero, then the file was truncated, otherwise it was written to in
update mode. (That's not perfect: if you had megabytes of data in a file
and wanted to change the first byte, you had no choice but to truncate
it and write it all again.)
Its downfall came when we wanted to start reading directories. If random
access was desired, then we needed to keep the full directory listing in
memory, or loop through it each time File/read was called. So the idea
of dealing with large data by calling File/read several times without
changing any of the other parameters was born. Now that File/offset
wasn't strictly necessary, we could replace it with all of File/stat,
File/delete and File/append.
Now I've been working on the in-Uxn debugger, I've been rather pining
for the earlier implementation, because there exists some state that's
internal to the File device that I can't put back on debugger exit.
The debugger works by extending Uxn's memory beyond 64k and having the
stacks and device memory be part of that extended memory:
┌─────────────────┬───────────────────────────────────────┐
│ 0x11800— │ Unallocated │
├─────────────────┼───────────────────────────────────────┤
│ 0x11700–0x117ff │ Debugger reserved │
│ 0x11600–0x116ff │ Debugger device memory │
│ 0x11500–0x115ff │ Debugger return stack │
│ 0x11400–0x114ff │ Debugger working stack │
│ 0x10400-0x113ff │ Debugger program code and working RAM │
├─────────────────┼───────────────────────────────────────┤
│ 0x10300–0x103ff │ Original program reserved │
│ 0x10200–0x102ff │ Original program device memory │
│ 0x10100–0x101ff │ Original program return stack │
│ 0x10000–0x100ff │ Original program working stack │
├─────────────────┼───────────────────────────────────────┤
│ 0x00000–0x0ffff │ Original program RAM │
└─────────────────┴───────────────────────────────────────┘
While the original program is running, LDA/STA can only access the RAM
in the first 64k, so execution is just as we have Uxn today. It doesn't
need to be aware of the stacks and device memory sitting just above, and
it also doesn't need to be aware of the preloaded data sitting between
0x10400 and 0x117ff. (Try not to get too hung up on the details of the
memory addresses or preloading plan, since this is still a work in
progress, but I'd prefer to be self-consistent with this explanation.)
When the program faults, memory is swapped around in uxn_halt() to load
the debugger. The swapping is configurable, but in this example we
always end up with:
┌─────────────────┬───────────────────────────────────────┐
│ 0x11800— │ Unallocated │
├─────────────────┼───────────────────────────────────────┤
│ 0x11700–0x117ff │ Original program reserved │
│ 0x11600–0x116ff │ Original program device memory │
│ 0x11500–0x115ff │ Original program return stack │
│ 0x11400–0x114ff │ Original program working stack │
│ 0x10400-0x113ff │ Original program RAM (0xf000–0xffff) │
├─────────────────┼───────────────────────────────────────┤
│ 0x10300–0x103ff │ Debugger reserved │
│ 0x10200–0x102ff │ Debugger device memory │
│ 0x10100–0x101ff │ Debugger return stack │
│ 0x10000–0x100ff │ Debugger working stack │
├─────────────────┼───────────────────────────────────────┤
│ 0x0f000–0x0ffff │ Debugger program code and working RAM │
│ 0x00000–0x0efff │ Original program RAM (0x0000–0xefff) │
└─────────────────┴───────────────────────────────────────┘
It's a challenge to make the debugger GUI small enough to fit in 0x1000
bytes, but if I succeed then we'll have the layout above, and execution
will continue at 0xf000 (also configurable via the debugger's
System/vector) which is in the hands of the debugger code.
One thing I really like about this setup is that all the stacks at the
time of the fault have been swapped out with fresh, empty stacks for the
debugger to use, so if the working stack overflowed then I don't need to
worry about overflowing it again with my first LIT. The swapping system
itself can be invoked within Uxntal, which the debugger will use to
bring back the original data at 0x10400–0x117ff into addresses that LDA
can inspect (e.g. 0xdc00–0xefff). So far so good.
This also means I don't have to save the values of the ports of the
Screen device before I start drawing my GUI, and to remember to restore
them to what the original program was using on return. The Screen
device's pixels obviously represents a lot of state that I cannot
access, so I will be clobbering stuff when I draw my GUI. But I can
largely mitigate the problems by using the foreground layer only for my
GUI: most Uxntal uses the foreground layer for the mouse cursor only, so
if I clear that layer on returning to the program, the worst impact is
that the mouse cursor isn't visible until the mouse is moved.
Unfortunately, I can't be so blasé about the File device. If I have to
read or write data as part of the debugging process, I can try to detect
if either of the File devices appears to be unused, but if they're both
in use then it's tough luck. I can't recreate what a File device was in
the middle of doing at the time of the fault, so if the program wants to
read or write the next chunk we will have corruption or data loss. This
is why I pine for the older File device, because reads and writes were
one-shot operations driven entirely by the device bytes' state, and
would work fine with the setup I've described.
I'm not saying that we should revert back to the old device without
thought: the abilities to read directory listings and to read data a
byte at a time without closing and reopening the file in C are genuine
advancements over what we had. But to me it shows that there may be some
room for improvement. I've talked about all the above in reference to
the debugger, but aside from the automatic swapping on fault, all of
this can be driven by Uxntal to provide a cooperative multitasking
environment, perhaps one day a true UxnOS without relying on features
provided by the specific port. So the gains can be substantial.
Thanks very much for reading this far!
Best wishes,
Andy
Re: File device and the debugger [was Re: Seeking the File device]
I've been thinking about this one and I was wondering, this file I/O
stuff is quite unfamiliar territory to me, so bear with me ^^;
I enjoy the current device because I can stream content from files
without taking any space in the running program, for example, my wiki
only reads 1 byte at a time, and so the buffer has a length of 1,
which leaves me a lot of space to do other things with the program.
If we were to go back to a simpler file device like we had, with the
option of getting bytes at a specific address with a specific length,
would I just have to increment an address in my uxntal program, and
still read one byte at a time at that specific address, pretty much
the same way I did? Would it be much slower since now it would
open/close the file each time? Would I notice a difference?
If not, then I really don't mind spending the time to migrate the
tools I use to the simpler file system, it might even make some other
programs simpler. And I'd love it if the file device implementation
was more straight-forward.
Could we allocate a buffer of say, 16kb for directory listing? And get
this buffer on direction read?
One thing that I like with the simpler file device is that you can use
files as external memory banks, and read/write data at absolute
positions.
Re: File device and the debugger [was Re: Seeking the File device]
On Mon, Jun 13, 2022 at 08:00:03AM -0700, Hundred Rabbits wrote:
>I've been thinking about this one and I was wondering, this file I/O>stuff is quite unfamiliar territory to me, so bear with me ^^;>>I enjoy the current device because I can stream content from files>without taking any space in the running program, for example, my wiki>only reads 1 byte at a time, and so the buffer has a length of 1,>which leaves me a lot of space to do other things with the program.>>If we were to go back to a simpler file device like we had, with the>option of getting bytes at a specific address with a specific length,>would I just have to increment an address in my uxntal program, and>still read one byte at a time at that specific address, pretty much>the same way I did? Would it be much slower since now it would>open/close the file each time? Would I notice a difference?
This has been playing on my mind for a while, because as you say, would
it be slower?
If we went back to the old implementation of opening, seeking, reading
and closing the file with each DEO then yes, it would be much slower
than what we have now.
But there's no real reason why that has to be the case!
On a DEO, the C code doesn't need to close it afterwards. On the next
DEO, the C code could check whether the filename has changed, and if
that's the case, then of course it'll need to open the new filename. But
it could keep track of a few FILE* pointers along with their filenames
and current position, so when a DEO comes in that reuses a file we can
just seek to the new offset. Or if the new offset happens to be where we
currently are, we don't need to seek either.
This approach raises the possibility of writing to a file then reading
the same file, without all the opens and closes of the File device we
use today. So if the file happens to be on a ramdisk in the host O/S, we
have our super fast caching mechanism.
This does seem place a burden on the complexity of the C code, but the
implementation doesn't *have* to have all the above caching and can fall
back to the open-seek-read-close simple behaviour we once had. Indeed, a
lot of the complexity of today's File device is hidden behind the
workings of stdio.c, and while that's a nice luxury to have on systems
with the standard C library, systems on the more microcontroller end of
the spectrum need to code these semantics ourselves. With this proposed
File device, a read could be as simple as finding the pointer to the
memory-mapped Flash device and memcpy'ing that over to Uxn RAM.
So like I said, it has been preoccupying my thoughts lately :-D
>Could we allocate a buffer of say, 16kb for directory listing? And get>this buffer on direction read?
I think there could be ways around the directory listing stuff too -
much like how we keep track of files, we could have a struct with the
directory name, the DIR* pointer, the current entry (dirent* pointer)
and the File/offset that entry starts at. If the Uxntal behaves as you
expect and wants the next chunk of data, we already have it ready to go;
but if Uxntal wants random access to it we just rewind the DIR* pointer
and go through the directory entries again. It's slower but if the
Uxntal wants to do that, there's no reason why we can't accommodate it.
And once again, a simpler implementation doesn't need to provide the
caching - most applications (e.g. left) will read it all on one go
anyway.
Best wishes,
Andy
Re: File device and the debugger [was Re: Seeking the File device]
Sounds like we're all onboard with this new File design.
If we went this way, how would we read a file larger than 64kb, for
example, I have a slideshow program, I use it a lot, which streams TGA
files and draw them but does not cache them because they are larger
than 64kb.
I can imagine that we could have a seek address that is 2 shorts?
Would it be worth to make it capable to auto-increment itself?
Ah, one other thing it might be worth thinking about sandboxing
by-default while we're here. I would love it if the file device was
always sand-boxed, I think people are afraid of the damages this could
do and I tend to agree, I've been running a lot of other people's roms
recently and it's just a matter of time.
Re: File device and the debugger [was Re: Seeking the File device]
On Sun, Jun 19, 2022 at 16:28:35 -0700, Hundred Rabbits wrote:
> Ah, one other thing it might be worth thinking about sandboxing> by-default while we're here. I would love it if the file device was> always sand-boxed, I think people are afraid of the damages this could> do and I tend to agree, I've been running a lot of other people's roms> recently and it's just a matter of time.
I was thinking about this recently and I agree with what cancel
implemented in uxn32: the VM can only access the filesystem at and below
the current working directory. This is easy to implement with unveil(2)!
When implementing manually, both relative addresses (../../etc/foo) and
absolute (/etc/foo) should be parsed to check if they are inside the
sandbox. It might be useful for an emulator to print a warning if an
attempt is made to access files outside the sandbox, so that some bugs
aren't so confusing, or to notify you that the rom you're running is
possibly untrustworthy.
With this sandboxing, it is easy to completely cut off a VM from the
rest of the filesystem by running it from inside an empty directory -
the emulator itself isn't necessarily initially sandboxed.
rm -fr foo
mkdir foo
cd foo
uxnemu ../bar.rom
phoebos
Re: File device and the debugger [was Re: Seeking the File device]
On Sun, Jun 19, 2022 at 04:28:35PM -0700, Hundred Rabbits wrote:
>Sounds like we're all onboard with this new File design.>>If we went this way, how would we read a file larger than 64kb, for>example, I have a slideshow program, I use it a lot, which streams TGA>files and draw them but does not cache them because they are larger>than 64kb.>>I can imagine that we could have a seek address that is 2 shorts?
Yes, I think that keeping it to one short isn't that practical now we
have rich applications like your slideshow program.
The snag then is that we don't have room for File/stat, File/delete and
File/append any more. I don't believe that File/stat is really used so
that can just go (but if anyone reading this knows differently, please
let us know).
I wrote at the top of my subthread how append semantics worked in the
original File device. If /offset is zero, then the file is truncated,
otherwise it's opened in update mode (so writes beyond the end of file
will extend it). The biggest drawback I can see is that if you have a
large file, you can't overwrite the first byte without it truncating on
you. That can be worked around if necessary, but I think the greater
danger is a naïve program treating the file as random-access memory
(something we want to encourage) and updating the first block and
whoops, the rest of your data is gone.
Perhaps the /success short can be overloaded as binarycat's first
message in this thread suggested. It's a great idea! It's normally only
read so far, but we could write a non-zero value to one of the bytes to
truncate the file (to the length given in /offset) and write the other
byte to delete the file. Then we have the full gamut of operations
without strange behaviour if you want to write the first byte.
>Would it be worth to make it capable to auto-increment itself?
Yeah, that's a genius idea! That makes using this File device for
streaming as convenient as the current one :-) It does mean that users
have to set the two /offset shorts back to zero if they want to switch
files, but considering all the other shorts that need setting anyway
(/name, /length, /read) it's something we can live with.
>Ah, one other thing it might be worth thinking about sandboxing>by-default while we're here. I would love it if the file device was>always sand-boxed, I think people are afraid of the damages this could>do and I tend to agree, I've been running a lot of other people's roms>recently and it's just a matter of time.
Yes, this is definitely a good time to introduce the security we wanted,
since it makes introducing this breaking change more worthwhile. I'm
very happy that phoebos already has an idea for this so let's give them
(plus any other interested folx) space to explore what we can do here
:-)
Best wishes,
Andy
Re: File device and the debugger [was Re: Seeking the File device]
On Sun, Jun 19, 2022 at 18:39:47 -0700, Hundred Rabbits wrote:
> > This is easy to implement with unveil(2)!> > Do you think you could make a PR to uxn11? I'd love to take this for a spin.
unveil(2) is OpenBSD-specific, but I've put together a check using realpath(3),
which is POSIX. The patch is attached.
realpath(3) finds the _canonical_ version of a pathname, so,
/foo/../bar// becomes /bar, and symlinks are also resolved. However, it
only works for existing paths, so I had to make a wrapper function which
tries realpath, and if I get ENOENT then remove the last component and
try again. The result is that I find the longest part which exists of the
requested pathname, and this can be compared to the current working
directory.
This roundabout method was the cleanest method in my opinion. Other
methods would be:
* writing a full version of realpath, but changing it to work as best as
possible with non-existent paths. This was my initial attempt but the
code is unclear, long, and less understandable than the attached
patch. However, if anyone wants to pursue this route, I can share the
function which I almost finished writing (it took me a few hours today!).
* if the file doesn't exist, create it before calling realpath and then
unlinking it afterwards. This is messy if a lot of nested directories
have to be created and then deleted.
There are improvements to be made: there are two UNIX assumptions in this patch:
that absolute pathnames begin with '/', and path components are separated by
'/'. There are notes in the code pointing out where these are made, and
could be fairly easily protected by some #ifdefs so that the appropriate
values for other systems can be used.
There is a second patch attached, which allows the load_rom function to
access files outside the sandbox, so that the workflow I previously
described is possible.
Please let me know what you think!
phoebos
Re: File device and the debugger [was Re: Seeking the File device]
Damn, this is a larger patch than I expected.
Excuse my naivety, but I was thinking, couldn't we just catch paths
that contain either `..` or `~/`, is there another ways to exit the
working folder than those two?
Re: File device and the debugger [was Re: Seeking the File device]
On Mon, Jun 20, 2022 at 07:39:32PM -0700, Hundred Rabbits wrote:
> Excuse my naivety, but I was thinking, couldn't we just catch paths> that contain either `..` or `~/`, is there another ways to exit the> working folder than those two?
On systems that have them you'd be able to use symbolic links to
escape the working folder without having to mention `..` or `~`.
-- Erik
Re: File device and the debugger [was Re: Seeking the File device]
On Mon, Jun 20, 2022 at 23:40:57 -0400, Erik Osheim wrote:
> On Mon, Jun 20, 2022 at 07:39:32PM -0700, Hundred Rabbits wrote:> > Excuse my naivety, but I was thinking, couldn't we just catch paths> > that contain either `..` or `~/`, is there another ways to exit the> > working folder than those two?> > On systems that have them you'd be able to use symbolic links to> escape the working folder without having to mention `..` or `~`.
Yes, I could use a symbolic link or an absolute address. '~' is expanded
by the shell and wouldn't work anyway.
I didn't want to just block paths containing '..', because it's a
perfectly valid thing to use if you've gone into one directory and want
a simple way to get up one, even without leaving the sandbox.
phoebos
Re: File device and the debugger [was Re: Seeking the File device]
On Tue, Jun 21, 2022 at 07:43:10AM -0700, Hundred Rabbits wrote:
>Gotcha, thanks for the explanation :) I will merge the patch today and>experiment.
I see the patch restricts File access to the current directory and
below — well done!
I wonder what the appetite is to restrict it further so that if it's run
in the home directory, it changes directory to e.g. ~/.config/uxn after
reading the ROM. My threat model here is that running Uxn from inside
~/uxn is safer, but when it's run directly in ${HOME} then it can
read/delete my ~/.ssh/ private keys.
This is technically doable with little extra work, but I was interested
in people's thoughts about the potential for confusion/annoyance and
whether that weighs more heavily than the security benefit.
Best wishes,
Andy
Re: File device and the debugger [was Re: Seeking the File device]
On Wed, Jun 22, 2022 at 01:01:18 +0100, Andrew Alderwick wrote:
> I see the patch restricts File access to the current directory and> below — well done!
:)
> I wonder what the appetite is to restrict it further so that if it's run in> the home directory, it changes directory to e.g. ~/.config/uxn after reading> the ROM. My threat model here is that running Uxn from inside ~/uxn is> safer, but when it's run directly in ${HOME} then it can read/delete my> ~/.ssh/ private keys.> > This is technically doable with little extra work, but I was interested in> people's thoughts about the potential for confusion/annoyance and whether> that weighs more heavily than the security benefit.
Interesting idea. That would obviously be a bit annoying for some things
- if I want to edit a file in $HOME using left, I'd have to copy the
file to some other directory below $HOME and work with it there.
However, it's a serious threat, since testing for an .ssh/ dir is simple,
although without any networking, hopefully the worst that can happen is
your secret keys are deleted.
Maybe a flag could turn on such a feature - or turn it off, for programs
like left.
The code would be something like:
load_rom(u, argv[1]);
...
char cwd[PATH_MAX], *home;
getcwd(cwd, PATH_MAX);
home = getenv("HOME");
if (home && strcmp(home, cwd) == 0) {
mkdir(".cache", 0755);
mkdir(".cache/uxn", 0755);
chdir(".cache/uxn");
}
/* start ... */
phoebos
Re: File device and the debugger [was Re: Seeking the File device]
On Wed, Jun 22, 2022 at 01:01:18 +0100, Andrew Alderwick wrote:
> I wonder what the appetite is to restrict it further so that if it's run in> the home directory, it changes directory to e.g. ~/.config/uxn after reading> the ROM. My threat model here is that running Uxn from inside ~/uxn is> safer, but when it's run directly in ${HOME} then it can read/delete my> ~/.ssh/ private keys.
On second thought, this would be quite confusing behaviour to any program
using the File device. Perhaps a better method would be to include in
the emulator code a list of paths which are forbidden to be accessed,
such as $HOME/.ssh . This list could be hard-coded or read from a file.
We'd have implemented something like unveil(2) (but the opposite!).
phoebos
Re: File device and the debugger [was Re: Seeking the File device]
> I wonder what the appetite is to restrict it further so that if it's run > in the home directory, it changes directory to e.g. ~/.config/uxn after > reading the ROM. My threat model here is that running Uxn from inside > ~/uxn is safer, but when it's run directly in ${HOME} then it can > read/delete my ~/.ssh/ private keys.> > This is technically doable with little extra work, but I was interested > in people's thoughts about the potential for confusion/annoyance and > whether that weighs more heavily than the security benefit.
I understand the motivation, but I think it should be sufficient to give
the user the option to run "sandboxed", i.e. by running the emulator
in a subdirectory. Trying to come up with a suitable "blacklist" is hard -
where does it end? What if users have non-standard locations for
their keys? What about sensitive data in non-standard locations?
Such "convenience" features as automatic cache creation and
accessibility lists quickly can become a burden and a security problem
on their own.
Just my 2 cents.
felix
Re: File device and the debugger [was Re: Seeking the File device]
hey phoebos,
just wanted to say thanks for the sandboxing patch, works amazingly
well. I've been testing it for the past two days and I haven't found a
way to exit it.
I had to change my workflow on how I do a few things, namely how my
wiki was generated but I think it's all changes for the better. So
yeah, thanks :)
Dll
Re: File device and the debugger [was Re: Seeking the File device]
On Wed, Jun 22, 2022 at 17:21:38 -0700, Hundred Rabbits wrote:
> just wanted to say thanks for the sandboxing patch, works amazingly> well. I've been testing it for the past two days and I haven't found a> way to exit it.
That's great to hear! I spent almost a full day on it so it's nice that
it's appreciated. Unfortunately it might require some rewriting if/when
the File devices are changed.
phoebos