----- Original Message ----- > I found Dave had alread done the first phase of future support for x86_64 > 5-level page tables(commit 307e7f35f510). when I asked him about the > state of this work, he gave me a more detailed answer and suggestion. > I follow his advice, and do the following job. > > > 1. Refine the original logical: > 1) Create some new common function for getting the offset of page table > 2) Repace the PML4 and UPML with the common PGD: > machdep->machspec->pml4/upml ==> machdep->pdg > 3) Using the PUD in x86_64 > > 2. Add 5-level page tables support for x86_64_k/uvtop() > > This patchset is the second phase of the work, As Dave said, we need to be > a manner of determining very early on whether the kernel page tables are > using 5-level and whether each user-space task is using 4- or 5-level page > tables. These will be done after this phase. > > About test work: > > I have tested this patchset with 4-level and 5-level paging table. > > sadump/ Xen/ Old Linux / RHEL4 are not be tested. Hello Dou, Thank you very much for the work you have done so far. I have not spent any time looking at the patches in detail, but instead I first ran a quick test of the patch on a set of ~250 kernels that I keep around for testing, where I just ran the "mod" command to at least verify that kernel virtual addresses could be translated. Now, as always, backwards compatibility must be maintained. I do not have any sadump dumpfiles, but obviously you (Fujitsu) can test those. However I do have some older Xen and RHEL4-era kernels in my sample set. As it turns out, *all* RHEL4 kernels failed (i.e. any kernel version earlier than 2.6.9), which report "WARNING: cannot access vmalloc'd module memory" during initialization when trying to gather the kernel module list. For all of the 2.6.9 and earlier kernels, they show the "WARNING: cannot access vmalloc'd module memory" message during session initialization: $ crash vmlinux-2.6.9-42.0.2.ELsmp.gz vmcore crash 7.2.1rc26 Copyright (C) 2002-2017 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... please wait... (gathering module symbol data) WARNING: cannot access vmalloc'd module memory KERNEL: vmlinux-2.6.9-42.0.2.ELsmp.gz DUMPFILE: vmcore CPUS: 8 DATE: Tue Nov 21 19:14:17 2006 UPTIME: 6 days, 01:23:25 LOAD AVERAGE: 24.34, 7.89, 4.46 TASKS: 865 NODENAME: lonrs00268 RELEASE: 2.6.9-42.0.2.ELsmp VERSION: #1 SMP Thu Aug 17 17:57:31 EDT 2006 MACHINE: x86_64 (2199 Mhz) MEMORY: 16 GB PANIC: "Kernel BUG at panic:75" PID: 20046 COMMAND: "oracle" TASK: 101c6b047f0 [THREAD_INFO: 101a428a000] CPU: 7 STATE: TASK_RUNNING (NMI) crash> If I run the session with "crash -d4 vmlinux-2.6.9-42.0.2.ELsmp.gz vmcore", you can see that it it reads a "pud page", but then fails: ... please wait... (gathering module symbol data)module: ffffffffa0634180 <readmem: ffffffffa0634180, KVADDR, "module struct", 1408, (ROE|Q), f73780> <readmem: 4f8000, PHYSADDR, "pud page", 4096, (FOE), 2080b40> <read_diskdump: addr: 4f8000 paddr: 4f8000 cnt: 4096> crash: invalid kernel virtual address: ffffffffa0634180 type: "module struct" WARNING: cannot access vmalloc'd module memory ... Without the patch, the module virtual address translation succeeds: ... please wait... (gathering module symbol data)module: ffffffffa0634180 <readmem: ffffffffa0634180, KVADDR, "module struct", 1408, (ROE|Q), f705e0> <readmem: 103000, PHYSADDR, "pgd page", 4096, (FOE), 25d7b50> <read_diskdump: addr: 103000 paddr: 103000 cnt: 4096> <readmem: 105000, PHYSADDR, "pmd page", 4096, (FOE), 25d8b60> <read_diskdump: addr: 105000 paddr: 105000 cnt: 4096> <readmem: d9bfb0000, PHYSADDR, "page table", 4096, (FOE), 25d9b70> <read_diskdump: addr: d9bfb0000 paddr: d9bfb0000 cnt: 4096> <read_diskdump: addr: ffffffffa0634180 paddr: d9bfb3180 cnt: 1408> ... So it appears to be reading from the wrong starting page table location, i.e., from "pud page 4f8000" instead of "pgd page 103000". Also, several Xen kernels failed with segmentation violations during session initialization. They all fail here in x86_64_xendump_load_page(), when "*pgd" gets referenced: static char * x86_64_xendump_load_page(ulong kvaddr, struct xendump_data *xd) { ulong mfn; ulong *pgd, *pud, *pmd, *ptep; pgd = ((ulong *)machdep->pgd) + pgd_index(kvaddr); mfn = ((*pgd) & PHYSICAL_PAGE_MASK) >> PAGESHIFT(); ^^^^ Here is the relevant part of the gdb trace of a 2.6.18-based xen kernel: Program terminated with signal 11, Segmentation fault. #0 0x0000000000502748 in x86_64_xendump_load_page (kvaddr=kvaddr@entry=18446744071568498888, xd=0xf521a0 <xendump_data>, xd=0xf521a0 <xendump_data>) at x86_64.c:7003 7003 mfn = ((*pgd) & PHYSICAL_PAGE_MASK) >> PAGESHIFT(); Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64 lzo-2.06-8.el7.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 snappy-1.1.0-3.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x0000000000502748 in x86_64_xendump_load_page (kvaddr=kvaddr@entry=18446744071568498888, xd=0xf521a0 <xendump_data>, xd=0xf521a0 <xendump_data>) at x86_64.c:7003 #1 0x0000000000503191 in x86_64_xendump_p2m_create (xd=0xf521a0 <xendump_data>) at x86_64.c:6749 #2 0x0000000000565d4e in xc_core_create_pfn_tables () at xendump.c:1258 #3 xc_core_read (addr=<optimized out>, paddr=7080864, cnt=32, bufptr=0xf70f80 <shared_bufs>) at xendump.c:168 #4 read_xendump (fd=<optimized out>, bufptr=0xf70f80 <shared_bufs>, cnt=32, addr=<optimized out>, paddr=7080864) at xendump.c:836 #5 0x000000000047b038 in readmem (addr=18446744071569148832, memtype=memtype@entry=1, buffer=buffer@entry=0xf70f80 <shared_bufs>, size=size@entry=32, type=type@entry=0x94dcc3 "possible", error_handle=error_handle@entry=2) at memory.c:2233 #6 0x00000000004ea33e in cpu_maps_init () at kernel.c:903 #7 kernel_init () at kernel.c:118 #8 0x0000000000467e5a in main_loop () at main.c:768 #9 0x000000000069dad3 in captured_command_loop (data=data@entry=0x0) at main.c:258 #10 0x000000000069c37a in catch_errors (func=func@entry=0x69dac0 <captured_command_loop>, func_args=func_args@entry=0x0, errstring=errstring@entry=0x8e713f "", mask=mask@entry=6) at exceptions.c:557 #11 0x000000000069ea66 in captured_main (data=data@entry=0x7ffd637c92a0) at main.c:1064 #12 0x000000000069c37a in catch_errors (func=func@entry=0x69dda0 <captured_main>, func_args=func_args@entry=0x7ffd637c92a0, errstring=errstring@entry=0x8e713f "", mask=mask@entry=6) at exceptions.c:557 #13 0x000000000069edc7 in gdb_main (args=0x7ffd637c92a0) at main.c:1079 #14 gdb_main_entry (argc=<optimized out>, argv=argv@entry=0x7ffd637c9408) at main.c:1099 #15 0x00000000004f0604 in gdb_main_loop (argc=<optimized out>, argc@entry=3, argv=argv@entry=0x7ffd637c9408) at gdb_interface.c:76 #16 0x00000000004662c5 in main (argc=3, argv=0x7ffd637c9408) at main.c:707 (gdb) p pgd $1 = (ulong *) 0xfffffffc054f4210 (gdb) I haven't investigated further, but in all of the xen cases, the value of "pgd" above was a kernel virtual address as shown in the example above. However, without the patch, the function looks like this, and with my debug printf of "pml4", the address is a user-space address as expected: static char * x86_64_xendump_load_page(ulong kvaddr, struct xendump_data *xd) { ulong mfn; ulong *pml4, *pgd, *pmd, *ptep; pml4 = ((ulong *)machdep->machspec->pml4) + pml4_index(kvaddr); mfn = ((*pml4) & PHYSICAL_PAGE_MASK) >> PAGESHIFT(); fprintf(fp, "x86_64_xendump_load_page: pml4: %lx\n", pml4); ... So for example, with the debug statement, I see this: # crash vmlinux-2.6.18-1.2714.el5xen.gz xguest-crashdump crash 7.2.1rc26 Copyright (C) 2002-2017 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... x86_64_xendump_load_page: pml4: 25d6c08 x86_64_xendump_load_page: pml4: 25d6c08 KERNEL: vmlinux-2.6.18-1.2714.el5xen.gz DUMPFILE: xguest-crashdump ... In a private email, I will send you a pointer to where I have temporarily stored the 2 vmlinux/vmcore pairs shown above. I'm thinking that it will probably be fairly easy for you to figure out what's happening in both cases. Again, I very much appreciate the work you have undertaken here. Thanks, Dave -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility