Re: gcore: Segmentation fault due to renaming of old_rsp symbol in kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> > From: crash-utility-bounces@xxxxxxxxxx [crash-utility-bounces@xxxxxxxxxx]
> > on behalf of Dave Anderson [anderson@xxxxxxxxxx]
> > Sent: Monday, October 31, 2016 4:47 PM
> 
> > It's always appreciated when bug reports come with proposed fixes, and
> > and your patch certainly looks reasonable to me. But the gcore extension
> > module is maintained by Daisuke Hatayama, and any changes will require
> > his ACK and a subsequent package update. Daisuke is a member of this
> > mailing list, but just to make sure he sees this, I've cc'd him directly
> > as well.
> 
> Oh, absolutely -- I expected that.  The message started as a plea for help
> which I didn't want to bother him directly with (but I knew he monitored
> this list).  Then as I did my due diligence, it morphed into a solution.  I
> should have switched gears and sent it directly to him at that point (and
> thank you for doing so) but I guess I was already on track to post it to the
> list.  Thanks for the feedback.

Just to clarify -- I didn't mean to imply that extension module bug reports and
patches shouldn't be posted to this list.  By all means they should be.

Thanks,
  Dave


> 
> 
> ----- Original Message -----
> 
> > I am trying to use gcore to generate a user application core from a kernel
> 
> > dump file. I compiled the latest crash-7.1.6 and crash-gcore-command-1.3.1
> 
> > from
> https://people.redhat.com/anderson/. I installed a debug kernel
> 
> > (vmlinux-4.1.34-33-debug.gz from openSUSE Leap 42.1) and did a controlled
> 
> > (sysrq-trigger) crash. When I attempt to use gcore on the process in
> 
> > question, after reading
> 
> > <https://people.redhat.com/anderson/extensions/gcore_help_gcore.html>, I
> > get
> 
> > a segmentation fault:
> 
> > 
> 
> > eje-code:~ # crash /boot/vmlinux-4.1.34-33-debug.gz
> 
> > /var/crash/2016-10-31-17\:01//vmcore
> 
> > 
> 
> > crash 7.1.6
> 
> > Copyright (C) 2002-2016 Red Hat, Inc.
> 
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> 
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> 
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> 
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> 
> > Copyright (C) 2005, 2011 NEC Corporation
> 
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> 
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> 
> > This program is free software, covered by the GNU General Public License,
> 
> > and you are welcome to change it and/or distribute copies of it under
> 
> > certain conditions. Enter "help copying" to see the conditions.
> 
> > This program has absolutely no warranty. Enter "help warranty" for details.
> 
> > 
> 
> > GNU gdb (GDB) 7.6
> 
> > Copyright (C) 2013 Free Software Foundation, Inc.
> 
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> 
> > This is free software: you are free to change and redistribute it.
> 
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> 
> > and "show warranty" for details.
> 
> > This GDB was configured as "x86_64-unknown-linux-gnu"...
> 
> > 
> 
> > KERNEL: /boot/vmlinux-4.1.34-33-debug.gz
> 
> > DUMPFILE: /var/crash/2016-10-31-17:01//vmcore
> 
> > CPUS: 4
> 
> > DATE: Mon Oct 31 13:01:36 2016
> 
> > UPTIME: 02:12:08
> 
> > LOAD AVERAGE: 0.00, 0.00, 0.00
> 
> > TASKS: 204
> 
> > NODENAME: eje-code
> 
> > RELEASE: 4.1.34-33-debug
> 
> > VERSION: #1 SMP Thu Oct 20 08:03:29 UTC 2016 (fe18aba)
> 
> > MACHINE: x86_64 (2094 Mhz)
> 
> > MEMORY: 4 GB
> 
> > PANIC: "sysrq: SysRq : Trigger a crash"
> 
> > PID: 3260
> 
> > COMMAND: "crashtest"
> 
> > TASK: ffff88011a020550 [THREAD_INFO: ffff8800bcd98000]
> 
> > CPU: 3
> 
> > STATE: TASK_RUNNING (SYSRQ)
> 
> > 
> 
> > crash> extend /usr/lib64/crash/extensions/gcore.so
> 
> > /usr/lib64/crash/extensions/gcore.so: shared object loaded
> 
> > crash> gcore -f 0 -v 7 3260
> 
> > gcore: Opening file core.3260.crashtest ...
> 
> > gcore: done.
> 
> > gcore: Writing ELF header ...
> 
> > gcore: done.
> 
> > gcore: Retrieving and writing note information ...
> 
> > Segmentation fault
> 
> > 
> 
> > Sixty-four bytes of core get written before the segmentation fault (I'm
> 
> > guessing that's the ELF header). I can gcore some other processes (although
> 
> > I get many "gcore: WARNING: page fault at 7ffca6a5d000" errors). I tried
> 
> > this both with an echo from bash from the command line and a custom test
> 
> > program that just does a controlled crash in a function nested four deep.
> 
> > The segmentation fault sometimes causes a hang (which I can end with
> 
> > Ctrl-C).
> 
> > 
> 
> > It does the same thing if I specify the task address (in this case, "gcore
> 
> > ffff88011a020550"). I've tried it without any options, too, and with
> 
> > different combinations.
> 
> > 
> 
> > I obtained a core dump of gcore and this is my debugging session:
> 
> > 
> 
> > eje-code:~ # gdb /usr/lib64/crash/extensions/gcore.so
> 
> > /var/core/core.eje-code-crash-3074
> 
> > GNU gdb (GDB; openSUSE Leap 42.1) 7.11.1
> 
> > Copyright (C) 2016 Free Software Foundation, Inc.
> 
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> 
> > This is free software: you are free to change and redistribute it.
> 
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> 
> > and "show warranty" for details.
> 
> > This GDB was configured as "x86_64-suse-linux".
> 
> > Type "show configuration" for configuration details.
> 
> > For bug reporting instructions, please see:
> 
> > <http://bugs.opensuse.org/>.
> 
> > Find the GDB manual and other documentation resources online at:
> 
> > <http://www.gnu.org/software/gdb/documentation/>.
> 
> > For help, type "help".
> 
> > Type "apropos word" to search for commands related to "word"...
> 
> > Reading symbols from /usr/lib64/crash/extensions/gcore.so...done.
> 
> > 
> 
> > warning: core file may not match specified executable file. [Not sure why
> 
> > ...]
> 
> > [New LWP 3074]
> 
> > [Thread debugging using libthread_db enabled]
> 
> > Using host libthread_db library "/lib64/libthread_db.so.1".
> 
> > Core was generated by `crash /boot/vmlinux-4.1.34-33-debug.gz
> 
> > /var/crash/2016-10-31-17:01//vmcore'.
> 
> > Program terminated with signal SIGSEGV, Segmentation fault.
> 
> > #0 0x0000000000000000 in ?? ()
> 
> > Missing separate debuginfos, use: zypper install
> 
> > glibc-debuginfo-2.19-17.4.x86_64 liblzma5-debuginfo-5.0.5-3.5.x86_64
> 
> > libncurses5-debuginfo-5.9-53.4.x86_64 libz1-debuginfo-1.2.8-6.4.x86_64
> 
> > (gdb) bt
> 
> > #0 0x0000000000000000 in ?? ()
> 
> > #1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8,
> 
> > regs=0xf6f280, active_regs=0x7ffefa968880)
> 
> > at libgcore/gcore_x86.c:1656
> 
> > #2 0x00007f1235eedcb6 in genregs_get (target=0x6939df8,
> > regset=0x7f12360f6460
> 
> > <x86_64_regsets>, size=216,
> 
> > buf=0xf6f280) at libgcore/gcore_x86.c:1795
> 
> > #3 0x00007f1235ee6438 in fill_write_thread_core_info (fp=0x59efb10,
> 
> > tc=0x6939df8, dump_tc=0x6939df8, info=0xf6ee80,
> 
> > view=0x7f12360f5d80 <x86_64_regset_view>, offset=0x7ffefa968ab0,
> 
> > total=0xf6ee98) at libgcore/gcore_coredump.c:469
> 
> > #4 0x00007f1235ee682c in fill_write_note_info (fp=0x59efb10, info=0xf6ee80,
> 
> > phnum=20, offset=0x7ffefa968ab0)
> 
> > at libgcore/gcore_coredump.c:566
> 
> > #5 0x00007f1235ee4dd1 in gcore_coredump () at libgcore/gcore_coredump.c:112
> 
> > #6 0x00007f1235eeeb8b in do_gcore (arg=0x0) at gcore.c:317
> 
> > #7 0x00007f1235eee926 in cmd_gcore () at gcore.c:253
> 
> > #8 0x0000000000472b8c in ?? ()
> 
> > #9 0x0000000000000000 in ?? ()
> 
> > (gdb) bt
> 
> > #0 0x0000000000000000 in ?? ()
> 
> > #1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8,
> 
> > regs=0xf6f280, active_regs=0x7ffefa968880)
> 
> > at libgcore/gcore_x86.c:1656
> 
> > #2 0x00007f1235eedcb6 in genregs_get (target=0x6939df8,
> > regset=0x7f12360f6460
> 
> > <x86_64_regsets>, size=216,
> 
> > buf=0xf6f280) at libgcore/gcore_x86.c:1795
> 
> > #3 0x00007f1235ee6438 in fill_write_thread_core_info (fp=0x59efb10,
> 
> > tc=0x6939df8, dump_tc=0x6939df8, info=0xf6ee80,
> 
> > view=0x7f12360f5d80 <x86_64_regset_view>, offset=0x7ffefa968ab0,
> 
> > total=0xf6ee98) at libgcore/gcore_coredump.c:469
> 
> > #4 0x00007f1235ee682c in fill_write_note_info (fp=0x59efb10, info=0xf6ee80,
> 
> > phnum=20, offset=0x7ffefa968ab0)
> 
> > at libgcore/gcore_coredump.c:566
> 
> > #5 0x00007f1235ee4dd1 in gcore_coredump () at libgcore/gcore_coredump.c:112
> 
> > #6 0x00007f1235eeeb8b in do_gcore (arg=0x0) at gcore.c:317
> 
> > #7 0x00007f1235eee926 in cmd_gcore () at gcore.c:253
> 
> > #8 0x0000000000472b8c in ?? ()
> 
> > #9 0x0000000000000000 in ?? ()
> 
> > (gdb) up
> 
> > #1 0x00007f1235eed4e4 in restore_regs_syscall_context (target=0x6939df8,
> 
> > regs=0xf6f280, active_regs=0x7ffefa968880)
> 
> > at libgcore/gcore_x86.c:1656
> 
> > 1656 regs->sp = gxt->get_old_rsp(target->processor);
> 
> > (gdb) print gxt
> 
> > $1 = (struct gcore_x86_table *) 0x215ea0 <gcore_x86_table>
> 
> > (gdb) print *target
> 
> > $2 = {task = 18446612137045525840, thread_info = 18446612135482589184, pid
> > =
> 
> > 3260, comm = "crashtest\000@XI\215u H",
> 
> > processor = 3, ptask = 18446612137046565648, mm_struct =
> 
> > 18446612137048351232, tc_next = 0x0}
> 
> > (gdb) print *regs
> 
> > $3 = {r15 = 0, r14 = 2, r13 = 2, r12 = 34324496, bp = 2, bx = 4196186, r11
> > =
> 
> > 582, r10 = 140728806957456,
> 
> > r9 = 140048302249728, r8 = 34324720, ax = 18446744073709551578, cx =
> 
> > 140048297135408, dx = 2, si = 140048302292992,
> 
> > di = 3, orig_ax = 1, ip = 140048297135408, cs = 51, flags = 582, sp =
> 
> > 140728806957864, ss = 43, fs_base = 0,
> 
> > gs_base = 0, ds = 0, es = 0, fs = 0, gs = 0}
> 
> > (gdb) print *gxt
> 
> > $4 = {get_old_rsp = 0x0, get_thread_struct_fpu = 0x0,
> 
> > get_thread_struct_fpu_size = 0x0, is_special_syscall = 0x0,
> 
> > is_special_ia32_syscall = 0x0, tsk_used_math = 0x0}
> 
> > =============================
> 
> > 
> 
> > So not only is get_old_rsp zero, all the fields in gxt are zero.
> 
> > 
> 
> > Looks like a kernel support issue. This field is filled in by
> 
> > gcore_x86_table_register_get_old_rsp() which looks up four symbols in
> 
> > various forms, none of which exist in my kernel:
> 
> > 
> 
> > eje-code:~ # fgrep old_rsp /proc/kallsyms
> 
> > eje-code:~ # fgrep cpu_pda /proc/kallsyms
> 
> > eje-code:~ #
> 
> > 
> 
> > old_rsp did exist in openSUSE 12.1 and 13.1 (3.11.10-29 for the latter).
> 
> > 
> 
> > According to
> http://lists.openwall.net/linux-kernel/2015/03/17/766 old_rsp
> 
> > was renamed rsp_scratch. I don't know if the semantics changed -- it
> > doesn't
> 
> > appear so -- but I added code to accept this symbol as an alternative and
> 
> > the core dump generates and works (I can see a correct backtrace). I do not
> 
> > warrant the work though. :-) Someone may want to review my work, and check
> 
> > the other functions and see if they are supposed to be zero. Since they
> 
> > haven't been invoked I don't know if they are supposed to be non-zero or
> 
> > not.
> 
> > 
> 
> > Here is the diff:
> 
> > 
> 
> > --- gcore_x86.c~ 2014-11-06 04:58:47.000000000 -0500
> 
> > +++ gcore_x86.c 2016-10-31 16:01:00.989025841 -0400
> 
> > @@ -1351,6 +1351,26 @@ static ulong gcore_x86_64_get_old_rsp(in
> 
> > }
> 
> > 
> 
> > /**
> 
> > + * gcore_x86_64_get_rsp_scratch() - get rsp at per-cpu area
> 
> > + *
> 
> > + * @cpu target CPU's CPU id
> 
> > + *
> 
> > + * Given a CPU id, returns a RSP value saved at per-cpu area for the
> 
> > + * CPU whose id is the given CPU id.
> 
> > + */
> 
> > +static ulong gcore_x86_64_get_rsp_scratch(int cpu)
> 
> > +{
> 
> > + ulong old_rsp;
> 
> > +
> 
> > + readmem(symbol_value("rsp_scratch") + kt->__per_cpu_offset[cpu],
> 
> > + KVADDR, &old_rsp, sizeof(old_rsp),
> 
> > + "gcore_x86_64_get_rsp_scratch: rsp_scratch",
> 
> > + gcore_verbose_error_handle());
> 
> > +
> 
> > + return old_rsp;
> 
> > +}
> 
> > +
> 
> > +/**
> 
> > * gcore_x86_64_get_per_cpu__old_rsp() - get rsp at per-cpu area
> 
> > *
> 
> > * @cpu target CPU's CPU id
> 
> > @@ -1834,6 +1854,11 @@ static void gcore_x86_table_register_get
> 
> > 
> 
> > else if (symbol_exists("_cpu_pda"))
> 
> > gxt->get_old_rsp = gcore_x86_64_get_cpu__pda_oldrsp;
> 
> > +
> 
> > + else if (symbol_exists("rsp_scratch"))
> 
> > + gxt->get_old_rsp = gcore_x86_64_get_rsp_scratch;
> 
> > +
> 
> > + if (!gxt->get_old_rsp) printf ("Warning: NO gxt->get_old_rsp\n");
> 
> > }
> 
> > #endif
> 
> > 
> 
> > 
> 
> > 
> 
> > --
> 
> > Crash-utility mailing list
> 
> > 
> Crash-utility@xxxxxxxxxx
> 
> > 
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> 
> 
> --
> 
> Crash-utility mailing list
> 
> Crash-utility@xxxxxxxxxx
> 
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> 
> 
> 
> 
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility
> 

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility



[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux