On Tue, Nov 13, 2018 at 10:26 AM Igor Stoppa <igor.stoppa@xxxxxxxxxx> wrote: > > On 13/11/2018 19:16, Andy Lutomirski wrote: > > > should be > > entirely abstracted away by an appropriate API, so neither SELinux nor > > IMA need to be aware that there's an mm_struct involved. > > Yes, that is fine. In my proposal I was thinking about tying it to the > core/thread that performs the actual write. > > The high level API could be something like: > > wr_memcpy(void *src, void *dst, uint_t size) > > > It's also > > entirely possible that some architectures won't even use an mm_struct > > behind the scenes -- x86, for example, could have avoided it if there > > were a kernel equivalent of PKRU. Sadly, there isn't. > > The mm_struct - or whatever is the means to do the write on that > architecture - can be kept hidden from the API. > > But the reason why I was proposing to have one mm_struct per writer is > that, iiuc, the secondary mapping is created in the secondary mm_struct > for each writer using it. > > So the updating of IMA measurements would have, theoretically, also > write access to the SELinux AVC. Which I was trying to avoid. > And similarly any other write rare updater. Is this correct? If you call a wr_memcpy() function with the signature you suggested, then you can overwrite any memory of this type. Having a different mm_struct under the hood makes no difference. As far as I'm concerned, for *dynamically allocated* rare-writable memory, you might as well allocate all the paging structures at the same time, so the mm_struct will always contain the mappings. If there are serious bugs in wr_memcpy() that cause it to write to the wrong place, we have bigger problems. I can imagine that we'd want a *typed* wr_memcpy()-like API some day, but that can wait for v2. And it still doesn't obviously need multiple mm_structs. > > >> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of > >> the patch might spill across the page boundary, however if I deal with > >> the modification of generic data, I shouldn't (shouldn't I?) assume that > >> the data will not span across multiple pages. > > > > The reason for the particular architecture of text_poke() is to avoid > > memory allocation to get it working. i think that prmem/rare_write > > should have each rare-writable kernel address map to a unique user > > address, possibly just by offsetting everything by a constant. For > > rare_write, you don't actually need it to work as such until fairly > > late in boot, since the rare_writable data will just be writable early > > on. > > Yes, that is true. I think it's safe to assume, from an attack pattern, > that as long as user space is not started, the system can be considered > ok. Even user-space code run from initrd should be ok, since it can be > bundled (and signed) as a single binary with the kernel. > > Modules loaded from a regular filesystem are a bit more risky, because > an attack might inject a rogue key in the key-ring and use it to load > malicious modules. If a malicious module is loaded, the game is over. > > >> If the data spans across multiple pages, in unknown amount, I suppose > >> that I should not keep interrupts disabled for an unknown time, as it > >> would hurt preemption. > >> > >> What I thought, in my initial patch-set, was to iterate over each page > >> that must be written to, in a loop, re-enabling interrupts in-between > >> iterations, to give pending interrupts a chance to be served. > >> > >> This would mean that the data being written to would not be consistent, > >> but it's a problem that would have to be addressed anyways, since it can > >> be still read by other cores, while the write is ongoing. > > > > This probably makes sense, except that enabling and disabling > > interrupts means you also need to restore the original mm_struct (most > > likely), which is slow. I don't think there's a generic way to check > > whether in interrupt is pending without turning interrupts on. > > The only "excuse" I have is that write_rare is opt-in and is "rare". > Maybe the enabling/disabling of interrupts - and the consequent switch > of mm_struct - could be somehow tied to the latency configuration? > > If preemption is disabled, the expectations on the system latency are > anyway more relaxed. > > But I'm not sure how it would work against I/O. I think it's entirely reasonable for the API to internally break up very large memcpys.