Re: [PATCH 10/17] prmem: documentation

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Tue, 13 Nov 2018 10:35:53 -0800

On Tue, Nov 13, 2018 at 10:26 AM Igor Stoppa <igor.stoppa@xxxxxxxxxx> wrote:
>
> On 13/11/2018 19:16, Andy Lutomirski wrote:
>
> > should be
> > entirely abstracted away by an appropriate API, so neither SELinux nor
> > IMA need to be aware that there's an mm_struct involved.
>
> Yes, that is fine. In my proposal I was thinking about tying it to the
> core/thread that performs the actual write.
>
> The high level API could be something like:
>
> wr_memcpy(void *src, void *dst, uint_t size)
>
> >  It's also
> > entirely possible that some architectures won't even use an mm_struct
> > behind the scenes -- x86, for example, could have avoided it if there
> > were a kernel equivalent of PKRU.  Sadly, there isn't.
>
> The mm_struct - or whatever is the means to do the write on that
> architecture - can be kept hidden from the API.
>
> But the reason why I was proposing to have one mm_struct per writer is
> that, iiuc, the secondary mapping is created in the secondary mm_struct
> for each writer using it.
>
> So the updating of IMA measurements would have, theoretically, also
> write access to the SELinux AVC. Which I was trying to avoid.
> And similarly any other write rare updater. Is this correct?

If you call a wr_memcpy() function with the signature you suggested,
then you can overwrite any memory of this type.  Having a different
mm_struct under the hood makes no difference.  As far as I'm
concerned, for *dynamically allocated* rare-writable memory, you might
as well allocate all the paging structures at the same time, so the
mm_struct will always contain the mappings.  If there are serious bugs
in wr_memcpy() that cause it to write to the wrong place, we have
bigger problems.

I can imagine that we'd want a *typed* wr_memcpy()-like API some day,
but that can wait for v2.  And it still doesn't obviously need
multiple mm_structs.

>
> >> 2) Iiuc, the purpose of the 2 pages being remapped is that the target of
> >> the patch might spill across the page boundary, however if I deal with
> >> the modification of generic data, I shouldn't (shouldn't I?) assume that
> >> the data will not span across multiple pages.
> >
> > The reason for the particular architecture of text_poke() is to avoid
> > memory allocation to get it working.  i think that prmem/rare_write
> > should have each rare-writable kernel address map to a unique user
> > address, possibly just by offsetting everything by a constant.  For
> > rare_write, you don't actually need it to work as such until fairly
> > late in boot, since the rare_writable data will just be writable early
> > on.
>
> Yes, that is true. I think it's safe to assume, from an attack pattern,
> that as long as user space is not started, the system can be considered
> ok. Even user-space code run from initrd should be ok, since it can be
> bundled (and signed) as a single binary with the kernel.
>
> Modules loaded from a regular filesystem are a bit more risky, because
> an attack might inject a rogue key in the key-ring and use it to load
> malicious modules.

If a malicious module is loaded, the game is over.

>
> >> If the data spans across multiple pages, in unknown amount, I suppose
> >> that I should not keep interrupts disabled for an unknown time, as it
> >> would hurt preemption.
> >>
> >> What I thought, in my initial patch-set, was to iterate over each page
> >> that must be written to, in a loop, re-enabling interrupts in-between
> >> iterations, to give pending interrupts a chance to be served.
> >>
> >> This would mean that the data being written to would not be consistent,
> >> but it's a problem that would have to be addressed anyways, since it can
> >> be still read by other cores, while the write is ongoing.
> >
> > This probably makes sense, except that enabling and disabling
> > interrupts means you also need to restore the original mm_struct (most
> > likely), which is slow.  I don't think there's a generic way to check
> > whether in interrupt is pending without turning interrupts on.
>
> The only "excuse" I have is that write_rare is opt-in and is "rare".
> Maybe the enabling/disabling of interrupts - and the consequent switch
> of mm_struct - could be somehow tied to the latency configuration?
>
> If preemption is disabled, the expectations on the system latency are
> anyway more relaxed.
>
> But I'm not sure how it would work against I/O.

I think it's entirely reasonable for the API to internally break up
very large memcpys.