Re: /proc/kcore reads 0's for vmap_block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/26/22 at 04:15pm, Stephen Brennan wrote:
> Hi all,
> 
> The /proc/kcore interface uses vread() to read memory addresses which
> are in the vmalloc region, and it seems that vread() doesn't support
> vmap_areas which are associated with a vmap_block. As far as I can tell,
> those have vmap_area.vm == NULL, and get skipped during vread()'s
> iteration. So the end result is that the read simply returns 0's.

Hmm, with my understanding, it should be vm_map_ram() which is called
without a struct vm_struct associated, and it's the only interface do
to so. vmap_block is the optimization way based on percpu to reduce vmap
lock race.

> 
> This impacts live debuggers like gdb and drgn, which is how I stumbled
> upon it[1]. It looks like crash avoids the issue by doing a page table
> walk and reading the physical address.
> 
> I'm wondering if there's any rationale for this omission from vread():
> is it a simple oversight, or was it omitted due to the difficulty? Is
> it possible for /proc/kcore to simply take the page faults when it reads
> unmapped memory and handle them? (I'm sure that's already discussed or
> is obviously infeasible for some reason beyond me.)

>From git history, the vmlist iterating was taken in vread() at the
first place. Later, in below commit, when people changed to iterate
vmap_area_list instead, they just inherited the old code logic. I guess
that's why vmap with NULL ->vm is skipped.

commit e81ce85f960c ("mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()")

> 
> Ideally, I'm just looking for a way forward that allows the debugger to
> *work* as expected, meaning either that /proc/kcore always reads the
> correct data, or that the debugger can know ahead of time that it will
> need to do some processing (like a page table walk) first. 

I think we can adjust vread() to allow those vmap_area with NULL ->vm
being read out? Made a draft patch at below, please feel free to have a
test. Not sure if there's any risk.

>From 9f1b786730f3ee0a8d5b48a94dbefa674102d7b9 Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe@xxxxxxxxxx>
Date: Thu, 27 Oct 2022 16:20:26 +0800
Subject: [PATCH] mm/vmalloc.c: allow to read out vm_map_ram() areas in vread()
Content-type: text/plain

Currently, vread can read out vmalloc areas who is associated with
a vm_struct. While this doesn't work for areas created by vm_map_ram()
interface because it doesn't allocate a vm_struct. Then in vread(),
these areas will be skipped.

Pages are passed into vm_map_ram() and mapped onto frea vmap area,
it should be safe to read them out. Change code to allow to read
out these vm_map_ram() areas in vread().

Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
---
 mm/vmalloc.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ccaa461998f3..f899ab784671 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3526,7 +3526,7 @@ long vread(char *buf, char *addr, unsigned long count)
 	struct vm_struct *vm;
 	char *vaddr, *buf_start = buf;
 	unsigned long buflen = count;
-	unsigned long n;
+	unsigned long n, size;
 
 	addr = kasan_reset_tag(addr);
 
@@ -3547,12 +3547,11 @@ long vread(char *buf, char *addr, unsigned long count)
 		if (!count)
 			break;
 
-		if (!va->vm)
-			continue;
-
 		vm = va->vm;
-		vaddr = (char *) vm->addr;
-		if (addr >= vaddr + get_vm_area_size(vm))
+		vaddr = (char *) va->va_start;
+		size = vm ? get_vm_area_size(vm) : va_size(va);
+
+		if (addr >= vaddr + size)
 			continue;
 		while (addr < vaddr) {
 			if (count == 0)
@@ -3562,10 +3561,10 @@ long vread(char *buf, char *addr, unsigned long count)
 			addr++;
 			count--;
 		}
-		n = vaddr + get_vm_area_size(vm) - addr;
+		n = vaddr + size - addr;
 		if (n > count)
 			n = count;
-		if (!(vm->flags & VM_IOREMAP))
+		if (!vm || !(vm->flags & VM_IOREMAP))
 			aligned_vread(buf, addr, n);
 		else /* IOREMAP area is treated as memory hole */
 			memset(buf, 0, n);
-- 
2.34.1





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux