~skeeto/public-inbox

Re: Mapping Multiple Memory Views in User Space

Eliot Miranda
Details
Message ID
<48161E08-34C1-4C52-85B3-D5C13F5CC17E@gmail.com>
DKIM signature
pass
Download raw message
Hi Chris,

    thanks very much for this.  I’ve used your solution to solve memory protection issues in a JIT compiled Smalltalk VM.  On ARMv8 using the Manjaro Linux distribution the system is set up to disallow any execution of code within a writable region of memory.  This effectively disables a number of important instruction modification based cacheing technique known as “inline cacheing”.

With inline cacheing results of past message lookups are recorded in code as register-load, call instruction pairs.  Originally a yet-to-be-executed send is generated as a register load of a message selector (the address of a unique string) and a call to a lookup routine.  When the lookup routine is executed it finds a target method, causes it to be JITted if not already, and modifies the original two instructions into a load of the current receiver’s class and a call to the checked entry-point of the JITted target method.  On subsequent execution the entry-point fetches the class of the current receiver and compares it with that loaded into the register at the send site.  If they agree then the message is being sent to an instance of the same class and exectution continues.  This is known as a monomorphic inline cache (MIC) because it caches a single class.

When they disagree a small jump table is allocated, that starts with the code to extract the class of the current receiver.  The table then compares the class against that loaded at the send site, jumping to the first target method if they match, followed by a second comparison and a jump to the second target.  If execution falls through then a routine is called that extends the jump table with a new case.  This is known as a polymorphic inline cache (PIC), because it deals with some small number of cases.  I call these closed PICs because the set of classes is closed.

If a PIC fills up then a small piece of code is generated to perform a hashed lookup into the first-level method lookup cache, a table mapping class, selector pairs to target methods.  This is more efficient than a normal first-level method lookup problem because the message selector and its hash are known at code generation time, also reducing register pressure.  I call these Open PICs because they handle an open set of classes.

So fine-grained instruction space modification is key to performance, and hence being prevented from executing code in writable memory is extremely problematic.  Going to indirect solutions where message sends are cached in writable memory, adding indirection to calls, impacts performance since these approaches imply indirect calls that defeat a processors instruction prefetch logic, unlike MICs and PICs above.

Using your solution I was able to map the JIT’s code zone twice, once as read-execute and once as read-write, and change the instruction modification logic to perform all writes through the writable mapping.  Again, thank you very much for this.

P.S. the code for the Smalltalk VM is written in Smalltalk, available via http://source.squeak.org/VMMaker/, and the generated C code from which production VMs are generated lives at https://github.com/OpenSmalltalk/opensmalltalk-vm.  Articles on the VMs development are available at http://www.mirandabanda.org/cogblog/
Reply to thread Export thread (mbox)