Re: [RFC] gcore subcommand: a process coredump feature

HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxxxxx> · Tue, 03 Aug 2010 15:17:00 +0900 (東京 (標準時))

Hello Iguchi-san,

Thanks for your comments.

From: "S.Iguchi" <iguchi.sg@xxxxxxxxxxxxxx>
Subject: Re:  [RFC] gcore subcommand: a process coredump feature
Date: Tue, 03 Aug 2010 13:10:09 +0900 (JST)

> Hi, Hatayama-san
> 
> I have a mostly same purpose extension with your patch.
> But your patch is great! , because supporting latest kernel and 
> also dump filter masking.
> 
> my current extention file is attached.
> Yes, my code is quite buggy, ugly and not enough against latest kernel
> than yours.
> (sigh ... I didnot know fill_vma_cache(), so do "vm -p" everytime before dump.)
> 
> BTW, I have some comments.
> I'd like to add some features below to yours. 
> or if you will do, it is happy for me. :) 
> 
> - support i386 
> - support elf32 binary on x86-64 
> - support old kernel (before 2.6.17)
> 
> as Dave said, if your patch committed as extension,
> I could submit some patches to that.
> 
> How about this?

As I've written in the first entry, I have a plan to support RHEL4,
RHEL5 and RHEL6 on i386, x86_64 and IA64, and the latest upstream
kernel, too. Next table shows correspondence of community's kernel
versions.

   RHEL4  RHEL5   RHEL6   upstream
  ---------------------------------
   2.6.9  2.6.18  2.6.32  2.6.35

So, it could probably be enough for your first and third requests.

On the other hand, I've not planned to support ia32 emulation over
both x86_64 and ia64.

> 
> Best regards,
> Seigo Iguchi
> 
> 
> From: HATAYAMA Daisuke <d.hatayama@xxxxxxxxxxxxxx>
> Subject:  [RFC] gcore subcommand: a process coredump feature
> Date: Mon, 02 Aug 2010 18:00:02 +0900	(東京 (標準時))
> 
>> Hello,
>> 
>> For some weeks I've developed gcore subcommand for crash utility which
>> provides process coredump feature for crash kernel dump, strongly
>> demanded by users who want to investigate user-space applications
>> contained in kernel crash dump.
>> 
>> I've now finished making a prototype version of gcore and found out
>> what are the issues to be addressed intensely. Could you give me any
>> comments and suggestions on this work?
>> 
>> 
>> Motivation
>> ==========
>> 
>> It's a relatively familiar technique that in a cluster system a
>> currently running node triggers crash kernel dump mechanism when
>> detecting a kind of a critical error in order for the running, error
>> detecting server to cease as soon as possible. Concequently, the
>> residual crash kernel dump contains a process image for the erroneous
>> user application. At the case, developpers are interested in user
>> space, rather than kernel space.
>> 
>> There's also a merit of gcore that it allows us to use several
>> userland debugging tools, such as GDB and binutils, in order to
>> analyze user space memory.
>> 
>> 
>> Current Status
>> ==============
>> 
>> I confirm the prototype version runs on the following configuration:
>> 
>>   Linux Kernel Version: 2.6.34
>>   Supporting Architecture: x86_64
>>   Crash Version: 5.0.5
>>   Dump Format: ELF
>> 
>> I'm planning to widen a range of support as follows:
>> 
>>   Linux Kernel Version: Any
>>   Supporting Architecture: i386, x86_64 and IA64
>>   Dump Format: Any
>> 
>> 
>> Issues
>> ======
>> 
>> Currently, I have issues below.
>> 
>> 1) Retrieval of appropriate register values
>> 
>> The prototype version retrieves register values from a _wrong_
>> location: a top of the kernel stack, into which register values are
>> saved at any preemption context switch. On the other hand, the
>> register values that should be included here are the ones saved at
>> user-to-kernel context switch on any interrupt event.
>> 
>> I've yet to implement this. Specifically, I need to do the following
>> task from now.
>> 
>>   (1) list all entries from user-space to kernel-space execution path.
>> 
>>   (2) divide the entries according to where and how the register
>>   values from user-space context are saved.
>> 
>>   (3) compose a program that retrieves the saved register values from
>>   appropriate locations that is traced by means of (1) and (2).
>> 
>> Ideally, I think it's best if crash library provides any means of
>> retrieving this kind of register values, that is, ones saved on
>> various stack frames. Is there such a plan to do?
>> 
>> 
>> 2) Getting a signal number for a task which was during core dump
>> process at kernel crash
>> 
>> If a target task is halfway of core dump process, it's better to know
>> a signal number in order to know why the task was about to be core
>> dumped.
>> 
>> Unfortunately, I have no choice but backtrace the kernel stack to
>> retrieve a signal number saved there as an argument of, for example,
>> do_coredump().
>> 
>> 
>> 3) Kernel version compatibility
>> 
>> crash's policy is to support all kernel versions by the latest crash
>> package. On the other hand, the prototype is based on kernel 2.6.34.
>> This means more kernel versions need to be supported.
>> 
>> Well, the question is: to what versions do I need to really test in
>> addition to the latest upstream kernel? I think it's practically
>> enough to support RHEL4, RHEL5 and RHEL6.
>> 
>> 
>> Build Instruction
>> =================
>> 
>>   $ tar xf crash-5.0.5.tar.gz
>>   $ cd crash-5.0.5/
>>   $ patch -p 1 < gcore.patch
>>   $ make
>> 
>> 
>> Usage
>> =====
>> 
>> Use help subcommand of crash utility as ``help gcore''.
>> 
>> 
>> Attached File
>> =============
>> 
>>   * gcore.patch
>> 
>>     A patch implementing gcore subcommand for crash-5.0.5.
>> 
>>     The diffstat output is as follows.
>> 
>> $ diffstat gcore.patch
>>  Makefile      |   10 +-
>>  defs.h        |   15 +
>>  gcore.c       | 1858 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  gcore.h       |  639 ++++++++++++++++++++
>>  global_data.c |    3 +
>>  help.c        |   28 +
>>  netdump.c     |   27 +
>>  tools.c       |   37 ++
>>  8 files changed, 2615 insertions(+), 2 deletions(-)
>> 
>> --
>> HATAYAMA Daisuke
>> d.hatayama@xxxxxxxxxxxxxx

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility