Re: [RFC 00/10] Process-local memory allocations for hiding KVM secrets

Nadav Amit <namit@xxxxxxxxxx> · Thu, 13 Jun 2019 01:50:46 +0000

> On Jun 12, 2019, at 6:30 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> 
> On Wed, Jun 12, 2019 at 1:27 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>> On Jun 12, 2019, at 12:55 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>>> 
>>>> On 6/12/19 10:08 AM, Marius Hillenbrand wrote:
>>>> This patch series proposes to introduce a region for what we call
>>>> process-local memory into the kernel's virtual address space.
>>> 
>>> It might be fun to cc some x86 folks on this series.  They might have
>>> some relevant opinions. ;)
>>> 
>>> A few high-level questions:
>>> 
>>> Why go to all this trouble to hide guest state like registers if all the
>>> guest data itself is still mapped?
>>> 
>>> Where's the context-switching code?  Did I just miss it?
>>> 
>>> We've discussed having per-cpu page tables where a given PGD is only in
>>> use from one CPU at a time.  I *think* this scheme still works in such a
>>> case, it just adds one more PGD entry that would have to context-switched.
>> 
>> Fair warning: Linus is on record as absolutely hating this idea. He might change his mind, but it’s an uphill battle.
> 
> I looked at the patch, and it (sensibly) has nothing to do with
> per-cpu PGDs.  So it's in great shape!
> 
> Seriously, though, here are some very high-level review comments:
> 
> Please don't call it "process local", since "process" is meaningless.
> Call it "mm local" or something like that.
> 
> We already have a per-mm kernel mapping: the LDT.  So please nix all
> the code that adds a new VA region, etc, except to the extent that
> some of it consists of valid cleanups in and of itself.  Instead,
> please refactor the LDT code (arch/x86/kernel/ldt.c, mainly) to make
> it use a more general "mm local" address range, and then reuse the
> same infrastructure for other fancy things.  The code that makes it
> KASLR-able should be in its very own patch that applies *after* the
> code that makes it all work so that, when the KASLR part causes a
> crash, we can bisect it.
> 
> + /*
> + * Faults in process-local memory may be caused by process-local
> + * addresses leaking into other contexts.
> + * tbd: warn and handle gracefully.
> + */
> + if (unlikely(fault_in_process_local(address))) {
> + pr_err("page fault in PROCLOCAL at %lx", address);
> + force_sig_fault(SIGSEGV, SEGV_MAPERR, (void __user *)address, current);
> + }
> +
> 
> Huh?  Either it's an OOPS or you shouldn't print any special
> debugging.  As it is, you're just blatantly leaking the address of the
> mm-local range to malicious user programs.
> 
> Also, you should IMO consider using this mechanism for kmap_atomic().
> Hi, Nadav!

Well, some context for the “hi” would have been helpful. (Do I have a bug
and I still don’t understand it?)

Perhaps you regard some use-case for a similar mechanism that I mentioned
before. I did implement something similar (but not the way that you wanted)
to improve the performance of seccomp and system-calls when retpolines are
used. I set per-mm code area that held code that used direct calls to invoke
seccomp filters and frequently used system-calls.

My mechanism, I think, is more not suitable for this use-case. I needed my
code-page to be at the same 2GB range as the kernel text/modules, which does
complicate things. Due to the same reason, it is also limited in the size of
the data/code that it can hold.