Hi Baoquan, Thanks for taking a look and coming up with a patch! (responses inline) Baoquan He <bhe@xxxxxxxxxx> writes: > On 10/26/22 at 04:15pm, Stephen Brennan wrote: >> Hi all, >> >> The /proc/kcore interface uses vread() to read memory addresses which >> are in the vmalloc region, and it seems that vread() doesn't support >> vmap_areas which are associated with a vmap_block. As far as I can tell, >> those have vmap_area.vm == NULL, and get skipped during vread()'s >> iteration. So the end result is that the read simply returns 0's. > > Hmm, with my understanding, it should be vm_map_ram() which is called > without a struct vm_struct associated, and it's the only interface do > to so. vmap_block is the optimization way based on percpu to reduce vmap > lock race. > >> >> This impacts live debuggers like gdb and drgn, which is how I stumbled >> upon it[1]. It looks like crash avoids the issue by doing a page table >> walk and reading the physical address. >> >> I'm wondering if there's any rationale for this omission from vread(): >> is it a simple oversight, or was it omitted due to the difficulty? Is >> it possible for /proc/kcore to simply take the page faults when it reads >> unmapped memory and handle them? (I'm sure that's already discussed or >> is obviously infeasible for some reason beyond me.) > > From git history, the vmlist iterating was taken in vread() at the > first place. Later, in below commit, when people changed to iterate > vmap_area_list instead, they just inherited the old code logic. I guess > that's why vmap with NULL ->vm is skipped. > > commit e81ce85f960c ("mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()") > >> >> Ideally, I'm just looking for a way forward that allows the debugger to >> *work* as expected, meaning either that /proc/kcore always reads the >> correct data, or that the debugger can know ahead of time that it will >> need to do some processing (like a page table walk) first. > > I think we can adjust vread() to allow those vmap_area with NULL ->vm > being read out? Made a draft patch at below, please feel free to have a > test. Not sure if there's any risk. I tested this out and it fully resolved the problem! I didn't encounter any issues with the debuggers using /proc/kcore. My test kernel module is here: https://github.com/brenns10/kernel_stuff/blob/master/drgn_vmalloc_test/drgn_vmalloc_test.c I was able to read the full contents of the mapped area correctly with drgn thanks to this patch. > > From 9f1b786730f3ee0a8d5b48a94dbefa674102d7b9 Mon Sep 17 00:00:00 2001 > From: Baoquan He <bhe@xxxxxxxxxx> > Date: Thu, 27 Oct 2022 16:20:26 +0800 > Subject: [PATCH] mm/vmalloc.c: allow to read out vm_map_ram() areas in vread() > Content-type: text/plain > > Currently, vread can read out vmalloc areas who is associated with > a vm_struct. While this doesn't work for areas created by vm_map_ram() > interface because it doesn't allocate a vm_struct. Then in vread(), > these areas will be skipped. > > Pages are passed into vm_map_ram() and mapped onto frea vmap area, > it should be safe to read them out. Change code to allow to read > out these vm_map_ram() areas in vread(). I see, we don't even need to worry about page faults because vm_map_ram() is guaranteed to have mapped them. That does make things easier :) > > Signed-off-by: Baoquan He <bhe@xxxxxxxxxx> Reviewed-by: Stephen Brennan <stephen.s.brennan@xxxxxxxxxx> (or Tested-by if you prefer) > --- > mm/vmalloc.c | 15 +++++++-------- > 1 file changed, 7 insertions(+), 8 deletions(-) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index ccaa461998f3..f899ab784671 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -3526,7 +3526,7 @@ long vread(char *buf, char *addr, unsigned long count) > struct vm_struct *vm; > char *vaddr, *buf_start = buf; > unsigned long buflen = count; > - unsigned long n; > + unsigned long n, size; > > addr = kasan_reset_tag(addr); > > @@ -3547,12 +3547,11 @@ long vread(char *buf, char *addr, unsigned long count) > if (!count) > break; > > - if (!va->vm) > - continue; > - > vm = va->vm; > - vaddr = (char *) vm->addr; > - if (addr >= vaddr + get_vm_area_size(vm)) > + vaddr = (char *) va->va_start; > + size = vm ? get_vm_area_size(vm) : va_size(va); > + > + if (addr >= vaddr + size) > continue; > while (addr < vaddr) { > if (count == 0) > @@ -3562,10 +3561,10 @@ long vread(char *buf, char *addr, unsigned long count) > addr++; > count--; > } > - n = vaddr + get_vm_area_size(vm) - addr; > + n = vaddr + size - addr; > if (n > count) > n = count; > - if (!(vm->flags & VM_IOREMAP)) > + if (!vm || !(vm->flags & VM_IOREMAP)) > aligned_vread(buf, addr, n); > else /* IOREMAP area is treated as memory hole */ > memset(buf, 0, n); > -- > 2.34.1