Re: [PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/01/2017 09:08 AM, Linus Torvalds wrote:
> On Tue, Oct 31, 2017 at 4:44 PM, Dave Hansen
> <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>> On 10/31/2017 04:27 PM, Linus Torvalds wrote:
>>>  (c) am I reading the code correctly, and the shadow page tables are
>>> *completely* duplicated?
>>>
>>>      That seems insane. Why isn't only tyhe top level shadowed, and
>>> then lower levels are shared between the shadowed and the "kernel"
>>> page tables?
>>
>> There are obviously two PGDs.  The userspace half of the PGD is an exact
>> copy so all the lower levels are shared.  The userspace copying is
>> done via the code we add to native_set_pgd().
> 
> So the thing that made me think you do all levels was that confusing
> kaiser_pagetable_walk() code (and to a lesser degree
> get_pa_from_mapping()).
> 
> That code definitely walks and allocates all levels.
> 
> So it really doesn't seem to be just sharing the top page table entry.

Yeah, they're quite lightly commented and badly named now that I go look
at them.

get_pa_from_mapping() should be called something like
get_pa_from_kernel_map().  Its job is to look at the main (kernel) page
tables and go get an address from there.  It's only ever called on
kernel addresses.

kaiser_pagetable_walk() should probably be
kaiser_shadow_pagetable_walk().  Its job is to walk the shadow copy and
find the location of a 4k PTE.  You can then populate that PTE with the
address you got from get_pa_from_mapping() (or clear it in the remove
mapping case).

I've attached an update to the core patch and Documentation that should
help clear this up.

> And that worries me because that seems to be a very fundamental coherency issue.
> 
> I'm assuming that this is about mapping only the individual kernel
> parts, but I'd like to get comments and clarification about that.

I assume that you're really worried about having to go two places to do
one thing, like clearing a dirty bit, or unmapping a PTE, especially
when we have to do that for userspace.  Thankfully, the sharing of the
page tables (under the PGD) for userspace gets rid of most of this
nastiness.

I hope that's more clear now.
diff --git a/Documentation/x86/kaiser.txt b/Documentation/x86/kaiser.txt
index 67a70d2..5b5e9c4 100644
--- a/Documentation/x86/kaiser.txt
+++ b/Documentation/x86/kaiser.txt
@@ -1,3 +1,6 @@
+Overview
+========
+
 KAISER is a countermeasure against attacks on kernel address
 information.  There are at least three existing, published,
 approaches using the shared user/kernel mapping and hardware features
@@ -18,6 +21,35 @@ This helps ensure that side-channel attacks that leverage the
 paging structures do not function when KAISER is enabled.  It
 can be enabled by setting CONFIG_KAISER=y
 
+Page Table Management
+=====================
+
+KAISER logically keeps a "copy" of the page tables which unmap
+the kernel while in userspace.  The kernel manages the page
+tables as normal, but the "copying" is done with a few tricks
+that mean that we do not have to manage two full copies.
+
+The first trick is that for any any new kernel mapping, we
+presume that we do not want it mapped to userspace.  That means
+we normally have no copying to do.  We only copy the kernel
+entries over to the shadow in response to a kaiser_add_*()
+call which is rare.
+
+For a new userspace mapping, the kernel makes the entries in
+its page tables like normal.  The only difference is when the
+kernel makes entries in the top (PGD) level.  In addition to
+setting the entry in the main kernel PGD, a copy if the entry
+is made in the shadow PGD.
+
+PGD entries always point to another page table.  Two PGD
+entries pointing to the same thing gives us shared page tables
+for all the lower entries.  This leaves a single, shared set of
+userspace page tables to manage.  One PTE to lock, one set set
+of accessed bits, dirty bits, etc...
+
+Overhead
+========
+
 Protection against side-channel attacks is important.  But,
 this protection comes at a cost:
 
diff --git a/arch/x86/mm/kaiser.c b/arch/x86/mm/kaiser.c
index 57f7637..cde9014 100644
--- a/arch/x86/mm/kaiser.c
+++ b/arch/x86/mm/kaiser.c
@@ -49,9 +49,21 @@
 static DEFINE_SPINLOCK(shadow_table_allocation_lock);
 
 /*
+ * This is a generic page table walker used only for walking kernel
+ * addresses.  We use it too help recreate the "shadow" page tables
+ * which are used while we are in userspace.
+ *
+ * This can be called on any kernel memory addresses and will work
+ * with any page sizes and any types: normal linear map memory,
+ * vmalloc(), even kmap().
+ *
+ * Note: this is only used when mapping new *kernel* entries into
+ * the user/shadow page tables.  It is never used for userspace
+ * addresses.
+ *
  * Returns -1 on error.
  */
-static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
+static inline unsigned long get_pa_from_kernel_map(unsigned long vaddr)
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
@@ -59,6 +71,8 @@ static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
 	pmd_t *pmd;
 	pte_t *pte;
 
+	WARN_ON_ONCE(vaddr < PAGE_OFFSET);
+
 	pgd = pgd_offset_k(vaddr);
 	/*
 	 * We made all the kernel PGDs present in kaiser_init().
@@ -111,13 +125,19 @@ static inline unsigned long get_pa_from_mapping(unsigned long vaddr)
 }
 
 /*
- * This is a relatively normal page table walk, except that it
- * also tries to allocate page tables pages along the way.
+ * Walk the shadow copy of the page tables (optionally) trying to
+ * allocate page table pages on the way down.  Does not support
+ * large pages since the data we are mapping is (generally) not
+ * large enough or aligned to 2MB.
+ *
+ * Note: this is only used when mapping *new* kernel data into the
+ * user/shadow page tables.  It is never used for userspace data.
  *
  * Returns a pointer to a PTE on success, or NULL on failure.
  */
 #define KAISER_WALK_ATOMIC  0x1
-static pte_t *kaiser_pagetable_walk(unsigned long address, unsigned long flags)
+static pte_t *kaiser_shadow_pagetable_walk(unsigned long address,
+					   unsigned long flags)
 {
 	pmd_t *pmd;
 	pud_t *pud;
@@ -207,11 +227,11 @@ int kaiser_add_user_map(const void *__start_addr, unsigned long size,
 	unsigned long target_address;
 
 	for (; address < end_addr; address += PAGE_SIZE) {
-		target_address = get_pa_from_mapping(address);
+		target_address = get_pa_from_kernel_map(address);
 		if (target_address == -1)
 			return -EIO;
 
-		pte = kaiser_pagetable_walk(address, false);
+		pte = kaiser_shadow_pagetable_walk(address, false);
 		/*
 		 * Errors come from either -ENOMEM for a page
 		 * table page, or something screwy that did a
@@ -348,7 +368,7 @@ void kaiser_remove_mapping(unsigned long start, unsigned long size)
 		 * context.  This should not do any allocations because we
 		 * should only be walking things that are known to be mapped.
 		 */
-		pte_t *pte = kaiser_pagetable_walk(addr, KAISER_WALK_ATOMIC);
+		pte_t *pte = kaiser_shadow_pagetable_walk(addr, KAISER_WALK_ATOMIC);
 
 		/*
 		 * We are removing a mapping that shoud

[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]
  Powered by Linux