[to-be-updated] vmalloc-new-flag-for-flush-before-releasing-pages.patch removed from -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: vmalloc: new flag for flush before releasing pages
has been removed from the -mm tree.  Its filename was
     vmalloc-new-flag-for-flush-before-releasing-pages.patch

This patch was dropped because an updated version will be merged

------------------------------------------------------
From: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
Subject: vmalloc: new flag for flush before releasing pages

Patch series "Don't leave executable TLB entries to freed pages".

Sometimes when memory is freed via the module subsystem, an executable
permissioned TLB entry can remain to a freed page.  If the page is re-used
to back an address that will receive data from userspace, it can result in
user data being mapped as executable in the kernel.  The root of this
behavior is vfree lazily flushing the TLB, but not lazily freeing the
underlying pages.  

There are sort of three categories of this which show up across modules, bpf,
kprobes and ftrace:

1. When executable memory is touched and then immediatly freed

   This shows up in a couple error conditions in the module loader and BPF JIT
   compiler.

2. When executable memory is set to RW right before being freed

   In this case (on x86 and probably others) there will be a TLB flush when its
   set to RW and so since the pages are not touched between setting the
   flush and the free, it should not be in the TLB in most cases. So this
   category is not as big of a concern. However, techinically there is still a
   race where an attacker could try to keep it alive for a short window with a
   well timed out-of-bound read or speculative read, so ideally this could be
   blocked as well.

3. When executable memory is freed in an interrupt

   At least one example of this is the freeing of init sections in the module
   loader. Since vmalloc reuses the allocation for the work queue linked list
   node for the deferred frees, the memory actually gets touched as part of the
   vfree operation and so returns to the TLB even after the flush from resetting
   the permissions.

I have only actually tested category 1, and identified 2 and 3 just from
reading the code.

To catch all of these, module_alloc for x86 is changed to use a new flag
that instructs the unmap operation to flush the TLB before freeing the
pages.

If this solution seems good I can plug the flag in for other architectures
that define PAGE_KERNEL_EXEC.


This patch (of 2):

Since vfree will lazily flush the TLB, but not lazily free the underlying
pages, it often leaves stale TLB entries to freed pages that could get
re-used.  This is undesirable for cases where the memory being freed has
special permissions such as executable.

Having callers flush the TLB after calling vfree still leaves a window
where the pages are freed, but the TLB entry remains.  Also the entire
operation can be deferred if the vfree is called from an interrupt and so
a TLB flush after calling vfree would miss the entire operation.  So in
order to support this use case, a new flag VM_IMMEDIATE_UNMAP is added,
that will cause the free operation to take place like this:

        1. Unmap
        2. Flush TLB/Unmap aliases
        3. Free pages

In the deferred case these steps are all done by the workqueue.

This implementation derives from two sketches from Dave Hansen and Andy
Lutomirski.

Link: http://lkml.kernel.org/r/20181128000754.18056-2-rick.p.edgecombe@xxxxxxxxx
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@xxxxxxxxx>
Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxx>
Suggested-by: Andy Lutomirski <luto@xxxxxxxxxx>
Suggested-by: Will Deacon <will.deacon@xxxxxxx>
Cc: Naveen N. Rao <naveen.n.rao@xxxxxxxxxxxxxxxxxx>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@xxxxxxxxx>
Cc: David S. Miller <davem@xxxxxxxxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Steven Rostedt (VMware) <rostedt@xxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
Cc: Jessica Yu <jeyu@xxxxxxxxxx>
Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
Cc: Jann Horn <jannh@xxxxxxxxxx>
Cc: Kristen Carlson Accardi <kristen@xxxxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 include/linux/vmalloc.h |    1 +
 mm/vmalloc.c            |   13 +++++++++++--
 2 files changed, 12 insertions(+), 2 deletions(-)

--- a/include/linux/vmalloc.h~vmalloc-new-flag-for-flush-before-releasing-pages
+++ a/include/linux/vmalloc.h
@@ -21,6 +21,7 @@ struct notifier_block;		/* in notifier.h
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_IMMEDIATE_UNMAP	0x00000200	/* flush before releasing pages */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
--- a/mm/vmalloc.c~vmalloc-new-flag-for-flush-before-releasing-pages
+++ a/mm/vmalloc.c
@@ -1516,6 +1516,14 @@ static void __vunmap(const void *addr, i
 	debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
 
 	remove_vm_area(addr);
+
+	/*
+	 * Need to flush the TLB before freeing pages in the case of this flag.
+	 * As long as that's happening, unmap aliases.
+	 */
+	if (area->flags & VM_IMMEDIATE_UNMAP)
+		vm_unmap_aliases();
+
 	if (deallocate_pages) {
 		int i;
 
@@ -1925,8 +1933,9 @@ EXPORT_SYMBOL(vzalloc_node);
 
 void *vmalloc_exec(unsigned long size)
 {
-	return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL_EXEC,
-			      NUMA_NO_NODE, __builtin_return_address(0));
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+			GFP_KERNEL, PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
+			NUMA_NO_NODE, __builtin_return_address(0));
 }
 
 #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)
_

Patches currently in -mm which might be from rick.p.edgecombe@xxxxxxxxx are





[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux