On Wed, Jun 12, 2019 at 6:50 PM Nadav Amit <namit@xxxxxxxxxx> wrote: > > > On Jun 12, 2019, at 6:30 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote: > > > > On Wed, Jun 12, 2019 at 1:27 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: > >>> On Jun 12, 2019, at 12:55 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > >>> > >>>> On 6/12/19 10:08 AM, Marius Hillenbrand wrote: > >>>> This patch series proposes to introduce a region for what we call > >>>> process-local memory into the kernel's virtual address space. > >>> > >>> It might be fun to cc some x86 folks on this series. They might have > >>> some relevant opinions. ;) > >>> > >>> A few high-level questions: > >>> > >>> Why go to all this trouble to hide guest state like registers if all the > >>> guest data itself is still mapped? > >>> > >>> Where's the context-switching code? Did I just miss it? > >>> > >>> We've discussed having per-cpu page tables where a given PGD is only in > >>> use from one CPU at a time. I *think* this scheme still works in such a > >>> case, it just adds one more PGD entry that would have to context-switched. > >> > >> Fair warning: Linus is on record as absolutely hating this idea. He might change his mind, but it’s an uphill battle. > > > > I looked at the patch, and it (sensibly) has nothing to do with > > per-cpu PGDs. So it's in great shape! > > > > Seriously, though, here are some very high-level review comments: > > > > Please don't call it "process local", since "process" is meaningless. > > Call it "mm local" or something like that. > > > > We already have a per-mm kernel mapping: the LDT. So please nix all > > the code that adds a new VA region, etc, except to the extent that > > some of it consists of valid cleanups in and of itself. Instead, > > please refactor the LDT code (arch/x86/kernel/ldt.c, mainly) to make > > it use a more general "mm local" address range, and then reuse the > > same infrastructure for other fancy things. The code that makes it > > KASLR-able should be in its very own patch that applies *after* the > > code that makes it all work so that, when the KASLR part causes a > > crash, we can bisect it. > > > > + /* > > + * Faults in process-local memory may be caused by process-local > > + * addresses leaking into other contexts. > > + * tbd: warn and handle gracefully. > > + */ > > + if (unlikely(fault_in_process_local(address))) { > > + pr_err("page fault in PROCLOCAL at %lx", address); > > + force_sig_fault(SIGSEGV, SEGV_MAPERR, (void __user *)address, current); > > + } > > + > > > > Huh? Either it's an OOPS or you shouldn't print any special > > debugging. As it is, you're just blatantly leaking the address of the > > mm-local range to malicious user programs. > > > > Also, you should IMO consider using this mechanism for kmap_atomic(). > > Hi, Nadav! > > Well, some context for the “hi” would have been helpful. (Do I have a bug > and I still don’t understand it?) Fair enough :) > > Perhaps you regard some use-case for a similar mechanism that I mentioned > before. I did implement something similar (but not the way that you wanted) > to improve the performance of seccomp and system-calls when retpolines are > used. I set per-mm code area that held code that used direct calls to invoke > seccomp filters and frequently used system-calls. > > My mechanism, I think, is more not suitable for this use-case. I needed my > code-page to be at the same 2GB range as the kernel text/modules, which does > complicate things. Due to the same reason, it is also limited in the size of > the data/code that it can hold. > I actually meant the opposite. If we had a general-purpose per-mm kernel address range, could it be used to optimize kmap_atomic() by limiting the scope of any shootdowns? As a rough sketch, we'd have some kmap_atomic slots for each cpu *in the mm-local region*. I'm not entirely sure this is a win. --Andy