Hello Dave, Thanks for your observations. From: Dave Anderson <anderson@xxxxxxxxxx> Subject: Re: [ANNOUNCE][RFC] gcore extension module: user-mode process core dump Date: Mon, 24 Jan 2011 14:27:39 -0500 (EST) > > > ----- Original Message ----- >> gcore extension module provides a means to create ELF core dump for >> user-mode process that is contained within crash kernel dump. I design >> this to behave as kernel's ELF core dumper. >> >> For previous discussion, see: >> https://www.redhat.com/archives/crash-utility/2010-August/msg00001.html > > A few observations... > > I'll fix unwind_x86_64.h to prevent this build warning: > > # make extensions > ... > gcc -Wall -I.. -I./libgcore -fPIC -DX86_64 -c -o libgcore/gcore_x86.o libgcore/gcore_x86.c > In file included from libgcore/gcore_x86.c:19: > ../unwind_x86_64.h:61:1: warning: "offsetof" redefined > In file included from libgcore/gcore_x86.c:17: > ../defs.h:60:1: warning: this is the location of the previous definition > ... > The warning is caused by IO_BITMAP_OFFSET that is defined but unused in gcore_x86.c. So, it seems to me that part to be fixed is gcore_x86.c, not unwind_x86_64.h. > But the gcore.mk file should gracefully fail to build on non-supported > architectures. It ends up spewing ~200 lines of error messages when > attempted, for example, on a ppc64 machine: Yes, I know it behaves like this if we make it run on unsupported architectures. I'd understood it was implicitly permitted by looking at similar build error of sial. But if it's wrong in fact, I'll make it buildable on unsupported architectures. gcore includes part that can be shared commonly among different architectures. This is mostly equal to anything but part of collecting kinds of note information that are inherently architecture speciffic. I'll fix here so that gcore on unsupported architectures are providing ELF core only with PT_LOAD sections. > > Your documentation implies that the command would only work on > certain kernel versions: > >> Compared with the previous version, this release: >> - supports more kernel versions, and >> - collects register values more accurately (but still not perfect). >> >> Support Range >> ============= >> >> |----------------+----------------------------------------------| >> | ARCH | X86, X86_64 | >> |----------------+----------------------------------------------| >> | Kernel Version | RHEL4.8, RHEL5.5, RHEL6.0 and Vanilla 2.6.36 | >> |----------------+----------------------------------------------| > > > But, for example, on a 2.6.34-2.fc14 kernel (presumably unsupported), > it seems to work OK on some tasks, but on others it doesn't work so well. > Here, the "less" command can be dumped OK kernel: > > > crash> sys | grep RELEASE > RELEASE: 2.6.34-2.fc14.x86_64 > crash> ps > ... [ cut ] ... > > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash > 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd > 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less > crash> gcore -v0 2090 > Saved core.2090.less > crash> > > But with the same (full) 2.6.34-2.fc14 dumpfile, it can't seem to handle > dumping the crash utility itself, and just hangs: > > crash> swap > FILENAME TYPE SIZE USED PCT PRIORITY > /dev/dm-1 PARTITION 18579452k 0k 0% -1 > crash> ps > ... [ cut ] ... > > 2080 1490 0 ffff880079ed2480 RU 7.6 289900 159684 crash > 2084 1 0 ffff880077a7a480 IN 0.1 248592 1936 rsyslogd > 2090 2080 5 ffff880079ed4900 IN 0.0 105432 828 less > crash> gcore -v1 2080 > gcore: Restoring the thread group ... > gcore: done. > gcore: Retrieving note information ... > > < hangs forever > > > ... > > I would have thought that it would either work-for-all or work-for-none > with respect to a particular kernel version? Sorry, I have no idea on what you mean by ``work-for-all or work-for-none''. ``supported kernel versions'' stands for ``I tested gcore extension module on these kernels''. There's possibility for gcore to work well even on differnet kernel versions if there's no incompatibility among the kernel versions. > > In any case, if it's going to fail, perhaps there should be some mechanism > in place that would prevent it from hanging, and instead print a message > that the kernel version is not supported? Or if a particular data structure > is different than the "supported" versions, it should fail immediately? > Just a thought... I agree to the former idea. I believe gcore has an enough chanse to work well on unsupported kernels. The hanging part is likely to be restore_frame_pointer() that runs only when the analized kernel is built with CONFIG_FRAME_POINTER=y and user-space frame pointer is available by looking at the base pointer in order. If kernel stack frame is in mess condition, unwinding behaviour by the function can be performed in any unexpected way. I'll fix here by adding some degree that limits the number of tracing to some finite number. Kernel stack size would be enough here. > > Also I note that "gcore -v7" fails -- shouldn't it be accepted as an argument? > > crash> gcore -v7 2080 > gcore: invalid vlevel: 7. > crash> Oh, sorry. This is just a bug that should be removed my unit testing. Thanks. I'll post again fixed version soon. Please wait for a while. Thanks. HATAYAMA Daisuke -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility