Re: [PATCH fix 6.12] mm: mark mas allocation in vms_abort_munmap_vmas as __GFP_NOFAIL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Jann Horn <jannh@xxxxxxxxxx> [241016 11:08]:
> vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs
> have already been torn down halfway (in a way we can't undo) but are
> still present in the maple tree.
> 
> At this point, we *must* remove the VMAs from the VMA tree, otherwise
> we get UAF.

Wait, the vma should be re-attached without PTEs and files closed in
this case.  I don't see how we are hitting the UAF in your example - we
shouldn't unless there is something with the driver itself and the file
close?

My concern is that I am missing an error scenario.

> 
> Because removing VMA tree nodes can require memory allocation, the
> existing code has an error path which tries to handle this by
> reattaching the VMAs; but that can't be done safely.
> 
> A nicer way to fix it would probably be to preallocate enough maple
> tree nodes for the removal before the point of no return, or something
> like that; but for now, fix it the easy and kinda ugly way, by marking
> this allocation __GFP_NOFAIL.

I don't think that's any better than what happens now or what you have
below.  We have a way to do what you are saying, but it would slow down
everyone for the sake of having enough memory around for the very rare
error path, and it is certainly better to dip into the reserves at that
point.  Your patch is better than this alternative.

But since this is under a lock that allows sleeping, shouldn't the
shrinker kick in?  Should we just use __GFP_NOFAIL for the first store
instead of the error path?

> 
> Fixes: 4f87153e82c4 ("mm: change failure of MAP_FIXED to restoring the gap on failure")
> Signed-off-by: Jann Horn <jannh@xxxxxxxxxx>
> ---
> This can be tested with the following reproducer (on a kernel built with
> CONFIG_KASAN=y, CONFIG_FAILSLAB=y, CONFIG_FAULT_INJECTION_DEBUG_FS=y,
> with the reproducer running as root):
> 
> ```
> 
>   typeof(x) __res = (x);      \
>   if (__res == (typeof(x))-1) \
>     err(1, "SYSCHK(" #x ")"); \
>   __res;                      \
> })
> 
> static void write_file(char *name, char *buf) {
>   int fd = open(name, O_WRONLY);
>   if (fd == -1)
>     err(1, "unable to open for writing: %s", name);
>   if (SYSCHK(write(fd, buf, strlen(buf))) != strlen(buf))
>     errx(1, "write %s", name);
>   SYSCHK(close(fd));
> }
> 
> int main(void) {
>   // make a large area with a bunch of VMAs
>   char *area = SYSCHK(mmap(NULL, AREA_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0));
>   for (int off=0; off<AREA_SIZE; off += 0x2000)
>     SYSCHK(mmap(area+off, 0x1000, PROT_READ, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0));
> 
>   // open a file whose mappings use usbdev_vm_ops, and map it in part of this area
>   int map_fd = SYSCHK(open("/dev/bus/usb/001/001", O_RDONLY));
>   void *map = SYSCHK(mmap(area, 0x1000, PROT_READ, MAP_SHARED|MAP_FIXED, map_fd, 0));
>   close(map_fd);
> 
>   // make RWX mapping request fail late
>   SYSCHK(prctl(65/*PR_SET_MDWE*/, 1/*PR_MDWE_REFUSE_EXEC_GAIN*/, 0, 0, 0));
> 
>   // some random other file
>   int fd = SYSCHK(open("/etc/passwd", O_RDONLY));
> 
>   /* disarm for now */
>   write_file("/sys/kernel/debug/failslab/probability", "0");
> 
>   /* fail allocations of maple_node... */
>   write_file("/sys/kernel/debug/failslab/cache-filter", "Y");
>   write_file("/sys/kernel/slab/maple_node/failslab", "1");
>   /* ... even though it's a sleepable allocation... */
>   write_file("/sys/kernel/debug/failslab/ignore-gfp-wait", "N");
>   /* ... in this task... */
>   write_file("/sys/kernel/debug/failslab/task-filter", "Y");
>   write_file("/proc/self/make-it-fail", "1");
>   /* ... every time... */
>   write_file("/sys/kernel/debug/failslab/times", "-1");
>   /* ... after 8 successful allocations (value chosen experimentally)... */
>   write_file("/sys/kernel/debug/failslab/space", "2048"); // one object is 256
>   /* ... and print where the allocations actually failed... */
>   write_file("/sys/kernel/debug/failslab/verbose", "2");
>   /* ... and that's it, arm it */
>   write_file("/sys/kernel/debug/failslab/probability", "100");
> 
>   // This mmap request will fail late due to MDWE.
>   // The error recovery path will attempt to clear out the VMA pointers with a
>   // spanning_store (which will be blocked by error injection).
>   mmap(area, AREA_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, fd, 0);
> 
>   /* disarm */
>   write_file("/sys/kernel/debug/failslab/probability", "0");
> 
>   SYSCHK(munmap(map, 0x1000)); // UAF expected here
>   system("cat /proc/$PPID/maps");
> }
> ```
> 
> Result:
> ```
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 100, space 256, times -1
> CPU: 3 UID: 0 PID: 607 Comm: unmap-fail Not tainted 6.12.0-rc3-00013-geca631b8fe80 #518
> [...]
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x80/0xa0
>  should_fail_ex+0x4d3/0x5c0
> [...]
>  should_failslab+0xc7/0x130
>  kmem_cache_alloc_noprof+0x73/0x3a0
> [...]
>  mas_alloc_nodes+0x3a3/0x690
>  mas_nomem+0xaa/0x1d0
>  mas_store_gfp+0x515/0xa80
> [...]
>  mmap_region+0xa96/0x2590
> [...]
>  do_mmap+0x71e/0xfe0
> [...]
>  vm_mmap_pgoff+0x17a/0x2f0
> [...]
>  ksys_mmap_pgoff+0x2ee/0x460
>  do_syscall_64+0x68/0x140
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [...]
>  </TASK>
> mmap: unmap-fail: (607) Unable to abort munmap() operation
> ==================================================================
> BUG: KASAN: slab-use-after-free in dec_usb_memory_use_count+0x365/0x430
> Read of size 8 at addr ffff88810e9ba8b8 by task unmap-fail/607

What was this pointer?

> 
> CPU: 3 UID: 0 PID: 607 Comm: unmap-fail Not tainted 6.12.0-rc3-00013-geca631b8fe80 #518
> [...]
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x66/0xa0
>  print_report+0xce/0x670
> [...]
>  kasan_report+0xf7/0x130
> [...]
>  dec_usb_memory_use_count+0x365/0x430
>  remove_vma+0x76/0x120
>  vms_complete_munmap_vmas+0x447/0x750
>  do_vmi_align_munmap+0x4b9/0x700
> [...]
>  do_vmi_munmap+0x164/0x2e0
>  __vm_munmap+0x128/0x2a0
> [...]
>  __x64_sys_munmap+0x59/0x80
>  do_syscall_64+0x68/0x140
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [...]
>  </TASK>
> 
> Allocated by task 607:
>  kasan_save_stack+0x33/0x60
>  kasan_save_track+0x14/0x30
>  __kasan_kmalloc+0xaa/0xb0
>  usbdev_mmap+0x1a0/0xaf0
>  mmap_region+0xf6e/0x2590
>  do_mmap+0x71e/0xfe0
>  vm_mmap_pgoff+0x17a/0x2f0
>  ksys_mmap_pgoff+0x2ee/0x460
>  do_syscall_64+0x68/0x140
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> Freed by task 607:
>  kasan_save_stack+0x33/0x60
>  kasan_save_track+0x14/0x30
>  kasan_save_free_info+0x3b/0x60
>  __kasan_slab_free+0x4f/0x70
>  kfree+0x148/0x450
>  vms_clean_up_area+0x188/0x220

What line is this?

>  mmap_region+0xf1b/0x2590
>  do_mmap+0x71e/0xfe0
>  vm_mmap_pgoff+0x17a/0x2f0
>  ksys_mmap_pgoff+0x2ee/0x460
>  do_syscall_64+0x68/0x140
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [...]
> ==================================================================
> ```
> ---
>  mm/vma.h | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/vma.h b/mm/vma.h
> index 819f994cf727..ebd78f1577f3 100644
> --- a/mm/vma.h
> +++ b/mm/vma.h
> @@ -241,15 +241,9 @@ static inline void vms_abort_munmap_vmas(struct vma_munmap_struct *vms,
>  	 * failure method of leaving a gap where the MAP_FIXED mapping failed.
>  	 */
>  	mas_set_range(mas, vms->start, vms->end - 1);
> -	if (unlikely(mas_store_gfp(mas, NULL, GFP_KERNEL))) {
> -		pr_warn_once("%s: (%d) Unable to abort munmap() operation\n",
> -			     current->comm, current->pid);
> -		/* Leaving vmas detached and in-tree may hamper recovery */
> -		reattach_vmas(mas_detach);
> -	} else {
> -		/* Clean up the insertion of the unfortunate gap */
> -		vms_complete_munmap_vmas(vms, mas_detach);
> -	}
> +	mas_store_gfp(mas, NULL, GFP_KERNEL|__GFP_NOFAIL);
> +	/* Clean up the insertion of the unfortunate gap */
> +	vms_complete_munmap_vmas(vms, mas_detach);
>  }
>  
>  int
> 
> ---
> base-commit: eca631b8fe808748d7585059c4307005ca5c5820
> change-id: 20241016-fix-munmap-abort-2330b61332aa
> -- 
> Jann Horn <jannh@xxxxxxxxxx>
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux