Re: [PATCH 0/5] Second phase of future support for x86_64 5-level page tables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

[...]
Thank you very much for the work you have done so far.  I have not spent
any time looking at the patches in detail, but instead I first ran a quick
test of the patch on a set of ~250 kernels that I keep around for testing,
where I just ran the "mod" command to at least verify that kernel virtual
addresses could be translated.

Now, as always, backwards compatibility must be maintained.  I do not have
any sadump dumpfiles, but obviously you (Fujitsu) can test those.  However

Yes, I am waiting the machine which can support sadump. I will test the
sadump dumpfiles.

I do have some older Xen and RHEL4-era kernels in my sample set.


Thank you so much about that, I will keep the backwards compatibility.

As it turns out, *all* RHEL4 kernels failed (i.e. any kernel version
earlier than 2.6.9), which report "WARNING: cannot access vmalloc'd
module memory" during initialization when trying to gather the kernel
module list.

For all of the 2.6.9 and earlier kernels, they show the "WARNING: cannot
access vmalloc'd module memory" message during session initialization:

   $ crash vmlinux-2.6.9-42.0.2.ELsmp.gz vmcore
crash 7.2.1rc26
   Copyright (C) 2002-2017  Red Hat, Inc.
   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
   Copyright (C) 1999-2006  Hewlett-Packard Co
   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
   Copyright (C) 2005, 2011  NEC Corporation
   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
   This program is free software, covered by the GNU General Public License,
   and you are welcome to change it and/or distribute copies of it under
   certain conditions.  Enter "help copying" to see the conditions.
   This program has absolutely no warranty.  Enter "help warranty" for details.
GNU gdb (GDB) 7.6
   Copyright (C) 2013 Free Software Foundation, Inc.
   License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
   This is free software: you are free to change and redistribute it.
   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
   and "show warranty" for details.
   This GDB was configured as "x86_64-unknown-linux-gnu"...
please wait... (gathering module symbol data)
   WARNING: cannot access vmalloc'd module memory
KERNEL: vmlinux-2.6.9-42.0.2.ELsmp.gz
       DUMPFILE: vmcore
           CPUS: 8
           DATE: Tue Nov 21 19:14:17 2006
         UPTIME: 6 days, 01:23:25
   LOAD AVERAGE: 24.34, 7.89, 4.46
          TASKS: 865
       NODENAME: lonrs00268
        RELEASE: 2.6.9-42.0.2.ELsmp
        VERSION: #1 SMP Thu Aug 17 17:57:31 EDT 2006
        MACHINE: x86_64  (2199 Mhz)
         MEMORY: 16 GB
          PANIC: "Kernel BUG at panic:75"
            PID: 20046
        COMMAND: "oracle"
           TASK: 101c6b047f0  [THREAD_INFO: 101a428a000]
            CPU: 7
          STATE: TASK_RUNNING (NMI)
crash> If I run the session with "crash -d4 vmlinux-2.6.9-42.0.2.ELsmp.gz vmcore",
you can see that it it reads a "pud page", but then fails:
...
   please wait... (gathering module symbol data)module: ffffffffa0634180
   <readmem: ffffffffa0634180, KVADDR, "module struct", 1408, (ROE|Q), f73780>
   <readmem: 4f8000, PHYSADDR, "pud page", 4096, (FOE), 2080b40>
   <read_diskdump: addr: 4f8000 paddr: 4f8000 cnt: 4096>
crash: invalid kernel virtual address: ffffffffa0634180 type: "module struct" WARNING: cannot access vmalloc'd module memory
   ...

Without the patch, the module virtual address translation succeeds:

   ...
   please wait... (gathering module symbol data)module: ffffffffa0634180
   <readmem: ffffffffa0634180, KVADDR, "module struct", 1408, (ROE|Q), f705e0>
   <readmem: 103000, PHYSADDR, "pgd page", 4096, (FOE), 25d7b50>
   <read_diskdump: addr: 103000 paddr: 103000 cnt: 4096>
   <readmem: 105000, PHYSADDR, "pmd page", 4096, (FOE), 25d8b60>
   <read_diskdump: addr: 105000 paddr: 105000 cnt: 4096>
   <readmem: d9bfb0000, PHYSADDR, "page table", 4096, (FOE), 25d9b70>
   <read_diskdump: addr: d9bfb0000 paddr: d9bfb0000 cnt: 4096>
   <read_diskdump: addr: ffffffffa0634180 paddr: d9bfb3180 cnt: 1408>
   ...

So it appears to be reading from the wrong starting page table location,
i.e., from "pud page 4f8000" instead of "pgd page 103000".

Also, several Xen kernels failed with segmentation violations during
session initialization.  They all fail here in x86_64_xendump_load_page(),
when "*pgd" gets referenced:

   static char *
   x86_64_xendump_load_page(ulong kvaddr, struct xendump_data *xd)
   {
           ulong mfn;
           ulong *pgd, *pud, *pmd, *ptep;
pgd = ((ulong *)machdep->pgd) + pgd_index(kvaddr);
           mfn = ((*pgd) & PHYSICAL_PAGE_MASK) >> PAGESHIFT();
                   ^^^^
Here is the relevant part of the gdb trace of a 2.6.18-based xen
kernel:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000502748 in x86_64_xendump_load_page (kvaddr=kvaddr@entry=18446744071568498888, xd=0xf521a0 <xendump_data>,
     xd=0xf521a0 <xendump_data>) at x86_64.c:7003
7003		mfn = ((*pgd) & PHYSICAL_PAGE_MASK) >> PAGESHIFT();
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 lzo-2.06-8.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 snappy-1.1.0-3.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x0000000000502748 in x86_64_xendump_load_page (kvaddr=kvaddr@entry=18446744071568498888, xd=0xf521a0 <xendump_data>,
     xd=0xf521a0 <xendump_data>) at x86_64.c:7003
#1  0x0000000000503191 in x86_64_xendump_p2m_create (xd=0xf521a0 <xendump_data>) at x86_64.c:6749
#2  0x0000000000565d4e in xc_core_create_pfn_tables () at xendump.c:1258
#3  xc_core_read (addr=<optimized out>, paddr=7080864, cnt=32, bufptr=0xf70f80 <shared_bufs>) at xendump.c:168
#4  read_xendump (fd=<optimized out>, bufptr=0xf70f80 <shared_bufs>, cnt=32, addr=<optimized out>, paddr=7080864) at xendump.c:836
#5  0x000000000047b038 in readmem (addr=18446744071569148832, memtype=memtype@entry=1, buffer=buffer@entry=0xf70f80 <shared_bufs>,
     size=size@entry=32, type=type@entry=0x94dcc3 "possible", error_handle=error_handle@entry=2) at memory.c:2233
#6  0x00000000004ea33e in cpu_maps_init () at kernel.c:903
#7  kernel_init () at kernel.c:118
#8  0x0000000000467e5a in main_loop () at main.c:768
#9  0x000000000069dad3 in captured_command_loop (data=data@entry=0x0) at main.c:258
#10 0x000000000069c37a in catch_errors (func=func@entry=0x69dac0 <captured_command_loop>, func_args=func_args@entry=0x0,
     errstring=errstring@entry=0x8e713f "", mask=mask@entry=6) at exceptions.c:557
#11 0x000000000069ea66 in captured_main (data=data@entry=0x7ffd637c92a0) at main.c:1064
#12 0x000000000069c37a in catch_errors (func=func@entry=0x69dda0 <captured_main>, func_args=func_args@entry=0x7ffd637c92a0,
     errstring=errstring@entry=0x8e713f "", mask=mask@entry=6) at exceptions.c:557
#13 0x000000000069edc7 in gdb_main (args=0x7ffd637c92a0) at main.c:1079
#14 gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7ffd637c9408) at main.c:1099
#15 0x00000000004f0604 in gdb_main_loop (argc=<optimized out>, argc@entry=3, argv=argv@entry=0x7ffd637c9408) at gdb_interface.c:76
#16 0x00000000004662c5 in main (argc=3, argv=0x7ffd637c9408) at main.c:707
(gdb) p pgd
$1 = (ulong *) 0xfffffffc054f4210
(gdb)

I haven't investigated further, but in all of the xen cases, the
value of "pgd" above was a kernel virtual address as shown in the
example above.

However, without the patch, the function looks like this, and with
my debug printf of "pml4", the address is a user-space address as
expected:

   static char *
   x86_64_xendump_load_page(ulong kvaddr, struct xendump_data *xd)
   {
           ulong mfn;
           ulong *pml4, *pgd, *pmd, *ptep;
pml4 = ((ulong *)machdep->machspec->pml4) + pml4_index(kvaddr);
           mfn = ((*pml4) & PHYSICAL_PAGE_MASK) >> PAGESHIFT();

   fprintf(fp, "x86_64_xendump_load_page: pml4: %lx\n", pml4);
...

So for example, with the debug statement, I see this:
# crash vmlinux-2.6.18-1.2714.el5xen.gz xguest-crashdump crash 7.2.1rc26
   Copyright (C) 2002-2017  Red Hat, Inc.
   Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
   Copyright (C) 1999-2006  Hewlett-Packard Co
   Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
   Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
   Copyright (C) 2005, 2011  NEC Corporation
   Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
   Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
   This program is free software, covered by the GNU General Public License,
   and you are welcome to change it and/or distribute copies of it under
   certain conditions.  Enter "help copying" to see the conditions.
   This program has absolutely no warranty.  Enter "help warranty" for details.
GNU gdb (GDB) 7.6
   Copyright (C) 2013 Free Software Foundation, Inc.
   License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
   This is free software: you are free to change and redistribute it.
   There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
   and "show warranty" for details.
   This GDB was configured as "x86_64-unknown-linux-gnu"...
x86_64_xendump_load_page: pml4: 25d6c08
   x86_64_xendump_load_page: pml4: 25d6c08
         KERNEL: vmlinux-2.6.18-1.2714.el5xen.gz
       DUMPFILE: xguest-crashdump
   ...


In a private email, I will send you a pointer to where I have temporarily
stored the 2 vmlinux/vmcore pairs shown above.  I'm thinking that it will
probably be fairly easy for you to figure out what's happening in both cases.


Yes, I saw it! Thanks you very much :-)

Thanks,
	dou.

Again, I very much appreciate the work you have undertaken here.

Thanks,
   Dave







--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility



[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux