Hello Dave, Thank you for your comment. From: anderson@xxxxxxxxxxxx Subject: Re: [RFC] gcore subcommand: a process coredump feature Date: Mon, 2 Aug 2010 19:02:23 -0400 (EDT) >> >> Hello, >> >> For some weeks I've developed gcore subcommand for crash utility which >> provides process coredump feature for crash kernel dump, strongly >> demanded by users who want to investigate user-space applications >> contained in kernel crash dump. >> >> I've now finished making a prototype version of gcore and found out >> what are the issues to be addressed intensely. Could you give me any >> comments and suggestions on this work? > > Hello Daisuke, > > As I mentioned in my previous email re: cpu numbering, I am currently > on vacation, and cannot spend much time looking at this issue until > I get back on August 9th. > > However, I think that this could be a useful feature, and I did > take a quick look at how it could be done several months ago when > it was brought up on this mailing list. However, as you discovered, I hear for the first time that the same kind of proposal was already proposed previously on this mailing list. I try to find it to compare with mine. > I also noted that the user-space core dump code in the kernel has > undergone significant changes over time, and so the implemetation > by the crash utility would have to adapt to the kernel data structures > used by the various kernel versions. And because of that, I don't > want to put it into the base crash binary, but rather it should be > maintained as one or more extension modules, which can be located > in the "extensions" subdirectory in the crash source package, as well > as stored in the "extensions" web page link from the crash "people" > web site. I agree basically, but I think a main stream of gcore can steadily be shared among the ones for different kernel versions, since dependent kernel data structures around there are mm and mmap members of task_struct and vm_* members of mm_struct only. So, I think it possible to keep the main stream in binary and make only kernel-version specific sub-programs to gather kinds of note information be distributed as shared libraries. > > It is quite simple to re-adapt your patch as an extension module. > Check the "snap.c" and "snap.mk" files in the extensions subdirectory > as templates for your "gcore" command. > > As to the other questions below, I will get back to you after > August 9th. Thanks, I'm waiting for your further comments. > > Thanks, > Dave > > >> Motivation >> ========== >> >> It's a relatively familiar technique that in a cluster system a >> currently running node triggers crash kernel dump mechanism when >> detecting a kind of a critical error in order for the running, error >> detecting server to cease as soon as possible. Concequently, the >> residual crash kernel dump contains a process image for the erroneous >> user application. At the case, developpers are interested in user >> space, rather than kernel space. >> >> There's also a merit of gcore that it allows us to use several >> userland debugging tools, such as GDB and binutils, in order to >> analyze user space memory. >> >> >> Current Status >> ============== >> >> I confirm the prototype version runs on the following configuration: >> >> Linux Kernel Version: 2.6.34 >> Supporting Architecture: x86_64 >> Crash Version: 5.0.5 >> Dump Format: ELF >> >> I'm planning to widen a range of support as follows: >> >> Linux Kernel Version: Any >> Supporting Architecture: i386, x86_64 and IA64 >> Dump Format: Any >> >> >> Issues >> ====== >> >> Currently, I have issues below. >> >> 1) Retrieval of appropriate register values >> >> The prototype version retrieves register values from a _wrong_ >> location: a top of the kernel stack, into which register values are >> saved at any preemption context switch. On the other hand, the >> register values that should be included here are the ones saved at >> user-to-kernel context switch on any interrupt event. >> >> I've yet to implement this. Specifically, I need to do the following >> task from now. >> >> (1) list all entries from user-space to kernel-space execution path. >> >> (2) divide the entries according to where and how the register >> values from user-space context are saved. >> >> (3) compose a program that retrieves the saved register values from >> appropriate locations that is traced by means of (1) and (2). >> >> Ideally, I think it's best if crash library provides any means of >> retrieving this kind of register values, that is, ones saved on >> various stack frames. Is there such a plan to do? >> >> >> 2) Getting a signal number for a task which was during core dump >> process at kernel crash >> >> If a target task is halfway of core dump process, it's better to know >> a signal number in order to know why the task was about to be core >> dumped. >> >> Unfortunately, I have no choice but backtrace the kernel stack to >> retrieve a signal number saved there as an argument of, for example, >> do_coredump(). >> >> >> 3) Kernel version compatibility >> >> crash's policy is to support all kernel versions by the latest crash >> package. On the other hand, the prototype is based on kernel 2.6.34. >> This means more kernel versions need to be supported. >> >> Well, the question is: to what versions do I need to really test in >> addition to the latest upstream kernel? I think it's practically >> enough to support RHEL4, RHEL5 and RHEL6. >> >> >> Build Instruction >> ================= >> >> $ tar xf crash-5.0.5.tar.gz >> $ cd crash-5.0.5/ >> $ patch -p 1 < gcore.patch >> $ make >> >> >> Usage >> ===== >> >> Use help subcommand of crash utility as ``help gcore''. >> >> >> Attached File >> ============= >> >> * gcore.patch >> >> A patch implementing gcore subcommand for crash-5.0.5. >> >> The diffstat output is as follows. >> >> $ diffstat gcore.patch >> Makefile | 10 +- >> defs.h | 15 + >> gcore.c | 1858 >> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> gcore.h | 639 ++++++++++++++++++++ >> global_data.c | 3 + >> help.c | 28 + >> netdump.c | 27 + >> tools.c | 37 ++ >> 8 files changed, 2615 insertions(+), 2 deletions(-) >> >> -- >> HATAYAMA Daisuke >> d.hatayama@xxxxxxxxxxxxxx >> -------------- next part -------------- >> A non-text attachment was scrubbed... >> Name: gcore.patch >> Type: text/x-patch >> Size: 78046 bytes >> Desc: not available >> URL: >> <https://www.redhat.com/archives/crash-utility/attachments/20100802/710541de/attachment.bin> >> >> ------------------------------ > > > -- > Crash-utility mailing list > Crash-utility@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/crash-utility > -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility