On Tue, 24 Jan 2017, James Hogan wrote: > > All the critical data structures would have to be outside the EVA > > overlap. > > This in itself is awkward. If a SoC supports multiple RAM sizes, e.g. > up to 2GB, you might want a single EVA kernel that could support both. > Normally you could just go with a 2GB compatible layout (for the sake of > argument, lets say RAM cached @ kernel VA 0x40000000 .. 0xBFFFFFFF, > ignoring BEV overlays for now), but if less than 1GB is fitted then none > of that RAM is outside of the user overlap range. Well, the kernel is in control of user mappings and can take a piece of the virtual address space away for internal use. At worst the kernel can map the necessary stuff in KSEG2 with a wired TLB entry. I agree this is far from being pretty though and do hope my other proposal turns out feasible. > > Poking at ASID as I described above is just a couple of instructions at > > entry and exit, and the rest would only be done if tracing is active. > > Plus you don't actually have to move anything away, except from the final > > ERET, though likely not even that, owing to the delayed nature of an ASID > > update. > > Probably, so long as you ignore QEMU. We can paper it over in QEMU I suppose -- we're not supposed to be interrupted at EXL and with a linear execution flow any sane hardware will have already fetched the following ERET by the time the immediately preceding MTC0 has been retired. We can cache-line-align the instruction pair to avoid surprises on real hardware too. > > > > So can you find a flaw in my proposal so far? > > not a functional one. Good! > > We'll have to think about > > the TLB refill handler yet though. > > A deferred watch from refill handler (e.g. page tables) would I think > trigger an immediate watch exception on eret, and get cleared / ignored. > It would probably make enough of a timing difference for userland to > reliably detect (in order to probe where the process' page tables are in > kernel virtual memory, to be able to mount a more successful attack > given some other vulnerability). I feel uneasy about it: if a watchpoint happens to be badly placed (not necessarily deliberately), then this could become a serious performance hit for the whole system (and if deliberately, then possibly a security concern). However I think we can get away quite easily again, by clearing CP0.Cause.WP unconditionally at the exit from the refill handler -- there's nothing of interest throughout the handler for watchpoints and we run at EXL until completion. Unfortunately some other writeable bits have been allocated in CP0.Cause, specifically DC and especially IV, so we can't just do: mtc0 $13, $zero However if we can prove that we won't need the IP[1:0] bits in scenarios that involve a TLB refill, then we could just quickly do a short sequence, say: lui $k0, 1 << 23 mtc0 $13, $k0 eret Otherwise we'll have to do a full RMW sequence; fortunately a single INS from $0 will do here again to clear CP0.Cause.WP and keep the remaining bits. Maybe we could do just the same in the regular exception epilogue to avoid the dependency on a hazard (and consequently an issue with QEMU). All these extra operations do cost some performance, but at least the latency is constant, so very predictable, which I believe is important in some use cases. Maciej