Hi Dave,
our dump-analysis hosts are still running RHEL6 and I recompile 'crash' from the latest sources myself. To my surprise, crash-7.2.2 built on
RHEL6 host segfaults immediately when I run 'mount' command. When I compiled it on a newer system (Ubuntu 16.04), it works fine on the same
vmcores!
Just in case, I have built crash-7.2.2 again (on RHEL6 host) without any extra options, just running 'make' after unpacking it. It still
segfaults on all vmcores I tried (RHEL5, RHEL6, RHEL7). The only command that triggers the segfault is 'mount', all other commands work fine.
Interestingly enough, 32-bit version of crash-7.2.2 built on the same RHEL6 host works fine (when using 32-bit vmcores).
I suspect that there is some kind of memory corruption in crash-7.2.2 (array out of boundaries?) that is just hidden when building it on
newer hosts due to changes in glibc.
Everything worked fine on RHEL6 with all previous versions of crash, we have been using 7.2.1 for long time.
Unfortunately, running crash under GDB does not reveal any details
{alexs 8:30:30} gdb --args /home/alexs/tools/crash-7.2.2/crash vmlinux vmcore.1
Python Exception <type 'exceptions.ImportError'> No module named gdb:
warning:
Could not load the Python gdb module from `/usr/local/share/gdb/python'.
Limited Python support is available from the _gdb module.
Suggest passing --data-directory=/path/to/gdb/data-directory.
GNU gdb (GDB) 7.8.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/alexs/tools/crash-7.2.2/crash...done.
(gdb) r
Starting program: /home/alexs/tools/crash-7.2.2/crash vmlinux vmcore.1
crash 7.2.2
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: vmlinux
DUMPFILE: vmcore.1 [PARTIAL DUMP]
CPUS: 14
DATE: Fri Nov 15 07:35:35 2013
UPTIME: 35 days, 05:06:01
LOAD AVERAGE: 491.43, 489.99, 485.49
TASKS: 941
NODENAME: gbrpsrmd0085
RELEASE: 2.6.32-131.6.1.el6.x86_64
VERSION: #1 SMP Mon Jun 20 14:15:38 EDT 2011
MACHINE: x86_64 (2892 Mhz)
MEMORY: 64 GB
PANIC: "SysRq : Trigger a crash"
PID: 0
COMMAND: "swapper"
TASK: ffff88101ca394c0 (1 of 14) [THREAD_INFO: ffff88081cba2000]
CPU: 5
STATE: TASK_RUNNING (SYSRQ)
crash> set scroll off
crash> mount
VFSMOUNT SUPERBLK TYPE DEVNAME DIRNAME
ffff88101c916080 ffff88081c837400 rootfs rootfs /
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
Python Exception <type 'exceptions.ImportError'> No module named gdb.frames:
#0 0x0000000000000000 in ?? ()
#1 0x0000000000000000 in ?? ()
Regards,
Alex
On 2018-05-16 03:37 PM, Dave Anderson wrote:
Download from: http://people.redhat.com/anderson
or
https://github.com/crash-utility/crash/releases
The github master branch serves as a development branch that will contain
all patches that are queued for the next release:
$ git clone git://github.com/crash-utility/crash.git
Changelog:
- Fix to support Linux 4.16-rc1 and later ARM64 kernels, which
fail during session initialization with the error message
"crash: cannot determine page size". The failure to determine
the page size is due to the combination of the following kernel
commits:
- Linux 4.6 commit 6ad1fe5d9077a1ab40bf74b61994d2e770b00b14
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
- Linux 4.10 commit 4b65a5db362783ab4b04ca1c1d2ad70ed9b0ba2a
arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
- Linux 4.16 commit 1e1b8c04fa3451e2b7190930adae43c95f0fae31
arm64: entry: Move the trampoline to be before PAN
(takahiro.akashi@xxxxxxxxxx)
- Fix the search for the booted kernel on a live system to prevent
selecting the unusable "vmlinux.o" file found in private build
directories. Without the patch, the non-executable vmlinux.o file
may be selected, and the resulting fatal error message indicates a
somewhat misleading "crash: cannot resolve _stext".
(bhsharma@xxxxxxxxxx, anderson@xxxxxxxxxx)
- Implemented a new "ps -A" option that restricts the task output to
just the active tasks on each cpu.
(atomlin@xxxxxxxxxx)
- As the first step in optimizing the is_page_ptr() function, save
the maximum SPARSEMEM section number during initialization, and
use it as the topmost delimeter in subsequent mem_section searches.
Also allow for per-architecture machdep->is_page_ptr() plugin functions.
(anderson@xxxxxxxxxx)
- Implemented the x86_64 machdep->is_page_ptr() plugin function. If
the kernel is configured with CONFIG_SPARSEMEM_VMEMMAP, the plugin
function optimizes the mem_section search, reducing the computation
effort and time consumed by commands that repeatedly call the
is_page_ptr() function on large-memory systems.
(k-hagio@xxxxxxxxxxxxx)
- Fixes for 32-bit X86 "bt" command on kernels that have been compiled
with retpoline gcc support. Without the patch, backtraces may fail
with the error message "bt: cannot resolve stack trace", followed by
the text symbols found on the stack and possible exception frames.
(anderson@xxxxxxxxxx)
- Fix the "help foreach" argument list to include the new "gleader"
task qualifier option that was added in version 7.1.2.
(anderson@xxxxxxxxxx)
- VMware VMSS dumpfiles contain the state of each vCPU at the time
when the VM was suspended. This patch enables crash to read the
relevant registers from each vCPU state for use as the starting hooks
by the "bt" command. Also, support for "help -[D|n]" to display
dumpfile contents, and "help -r" to display vCPU register sets has
been implemented. This is also the first step towards implementing
automatic KASLR offset calculations for VMSS dumpfiles.
(slp@xxxxxxxxxx)
- Commit 45b74b89530d611b3fa95a1041e158fbb865fa84 added support for
calculating phys_base and the mapped kernel offset for KASLR-enabled
kernels on SADUMP dumpfiles by using a technique developed by Takao
Indoh. Originally, the patchset included support for kdumps, but this
was dropped in v2, as it was deemed unnecessary due to the upstream
implementation of the "vmcoreinfo device" in QEMU. However, there
are still several reasons for which the vmcoreinfo device may not be
present at the time when a memory dump is taken from a VM, ranging
from a host running older QEMU/libvirt versions, to misconfigured VMs
or environments running Hypervisors that doesn't support this device.
This patchset generalizes the KASLR-related functions from sadump.c
and moves them to kaslr_helper.c, and makes kdump analysis fall back
to KASLR offset calculation if vmcoreinfo data is missing.
(slp@xxxxxxxxxx)
- Fix for the "bt" command on 4.16 and later kernels size in which the
"thread_union" data structure is not contained in the vmlinux file's
debuginfo data. Without the patch, the kernel stack size is not
calculated correctly, and defaults to 8K. As a result "bt" fails
with the message "bt: invalid RSP: <address> bt->stackbase/stacktop:
<address>/<address> cpu: <number>".
(efault@xxxxxx)
- Fix for the x86_64 "bt" command for kernels that are configured with
CONFIG_FRAME_POINTER. Without the patch, the per-text-return-address
framesize cache may contain invalid entries for functions that have
an "and $0xfffffffffffffff0,%rsp" instruction in their prologue,
which aligns the stack on a 16-byte boundary; therefore any cached
framesize for a text-return-address in such a function may be
incorrect depending upon the alignment of the stack address of a
calling function. If an invalid cached framesize is utilized by
"bt", the backtrace may skip over several frames, or may display
one or more invalid (stale) frames. The patch introduces a new
cache that contains functions for which framesize values should
not be cached.
(anderson@xxxxxxxxxx)
- Speed up the "bt" command by avoiding the text value cache that
was put in place many years ago when the crash utility supported the
analysis of remote dumpfiles using the deprecated "crash daemon"
running on the remote host. The performance improvement will be
most noticable when running the first instance of "foreach bt",
where there would often be a "hitch" when it was determining the
framesize of kernel module text return addresses.
(anderson@xxxxxxxxxx)
- Optimization of the crash startup time and "ps" command processing
time when analyzing dumpfiles/systems with extremely large task
counts. For example, running with a dumpfile containing over a
million tasks, startup time and "ps" processing time was reduced
from 90 minutes to less then 40 seconds.
(gthelen@xxxxxxxxxx)
- Speed up the "ps -r" option by stashing the length of the
task_struct.rlim or signal_struct.rlim array in the internal
array_table[]. Without the patch, the length of the array
is determined by a call to the embedded gdb module for each
task, and as a result, the command takes a minute or more
per 1000 tasks. With the patch applied, it only takes about
0.5 seconds per 1000 tasks.
(k-hagio@xxxxxxxxxxxxx)
- Added a new "tree -l" option for the rbtree display, which dumps
the tree sorted in linear order, starting with the leftmost node and
progressing to the right. Also, if a corrupted rb_node pointer is
encountered, do not fail immediately, but rather display the rb_node
address and the corrupt pointer and continue.
(neelx@xxxxxxxxxx)
- Display a fatal error message if the "tree -l" option is attempted
with radix trees. Without the patch, the option would be silently
ignored.
(neelx@xxxxxxxxxx)
- Introduction of a new "bpf" command that displays information about
loaded eBFP (extended Berkeley Packet Filter) programs and maps.
Because of its upstream fluidity, the capabilities of this command
will be an ongoing task. In its initial form, the command displays
the addresses, basic information, and key data structures of eBPF
programs and maps. It also translates the bytecode, and disassembles
the jited code, of loaded eBPF programs.
(anderson@xxxxxxxxxx)
- Fixes to address several gcc-8.0.1 compiler warnings that are generated
when building with "make warn". The warnings are all false alarm
messages of type [-Wformat-overflow=], [-Wformat-truncation=] and
[-Wstringop-truncation]; the affected files are extensions.c, task.c,
kernel.c, memory.c, remote.c, symbols.c, filesys.c and xen_hyper.c.
(anderson@xxxxxxxxxx)
- Fix for the "ps -a" option for a user task that has utilized
"prctl(PR_SET_MM, ...)" to self-modify its memory map such
that the stack locations of its command line arguments and
environment variables such are not contiguous. Without the
patch, the command may fail with a dump of the crash utility's
internal buffer usage statistics followed by "ps: cannot allocate
any more memory!".
(k-hagio@xxxxxxxxxxxxx)
- Fix for a compilation error on ARM64. Without the patch, the
compilation of the new bpf.c file fails with the error message
"bpf.c:881:18: error: conflicting types for 'u64'"
(anderson@xxxxxxxxxx)
- Fix for an s390x session initialization-time warning that indicates
"WARNING: cannot determine MAX_PHYSMEM_BITS" on Linux 4.15 and later
kernels containing commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4,
which changed the data type of "mem_section" from an array to a
pointer. Without the patch, the s390x manner of determining
MAX_PHYSMEM_BITS fails because it presumes that "mem_section" is
an array, and as a result, displays the warning message.
(anderson@xxxxxxxxxx)
- Fix for the determination of the ARM64 phys_offset value when
running live against /proc/kcore. Without the patch, the message
"WARNING: cannot access vmalloc'd module memory" may be displayed
during session initialization, and vmalloc/module memory will be
unaccessible. (It should be noted that at the time of this patch,
the upstream version of /proc/kcore does not work correctly for
ARM64, because PT_LOAD segments for unity-mapped blocks of physical
are not generated.)
(anderson@xxxxxxxxxx)
- For live system analysis, if both "/dev/mem" and the "/dev/crash"
memory driver do not exist, try to use "/proc/kcore". Without
the patch, the session fails immediately with the error message
"crash: /dev/mem: No such file or directory".
(anderson@xxxxxxxxxx)
- Fix, and an update, for the "ipcs" command. The fix addresses an
error where IPCS entries are not displayed because of a faulty
read of the "deleted" member of the embedded "kern_ipc_perm" data
structure. The "deleted" member was being read as a 4-byte integer,
but since it is declared as a "bool" type, only the lowest byte gets
set to 1 or 0. Since the structure is not zeroed-out when allocated,
stale data may be left in the upper 3 bytes, and the IPCS entry
gets rejected. The update is required for Linux 4.11 and greater
kernels, which reimplemented the IDR facility to use radix trees
in kernel commit 0a835c4f090af2c76fc2932c539c3b32fd21fbbb, titled
"Reimplement IDR and IDA using the radix tree". Without the patch,
if any IPCS entry exists, the command would fail with the message
"ipcs: invalid structure member offset: idr_top"
(anderson@xxxxxxxxxx)
- Second stage of the new "bpf" command. This patch adds additional
per-program and per-map data for the "bpf -p ID" and "bpf -m ID"
options, containing data items shown by the "bpftool prog list"
and "bpftool map list" options; new "bpf -P" and "bpf -M" options
have been added that dump the extra data for all loaded programs
or tasks.
(anderson@xxxxxxxxxx)
- Fix for a compilation error of the new "bpf.c" file when building
on older host systems where CLOCK_BOOTTIME does not exist.
(anderson@xxxxxxxxxx)
- Fix for infrequent failures of the x86 "bt" command to handle cases
where a user space task with "resume_userspace" or "entry_INT80_32"
at the top of the stack, or which was interrupted by the crash NMI
while handling a timer interrupt. Without the patch, the backtrace
would be proceeded with the error message "bt: cannot resolve stack
trace", and then dump the text symbols found on the stack and all
possible exception frames.
(anderson@xxxxxxxxxx)
- Trivial formatting fix to "bpf" help page.
(anderson@xxxxxxxxxx)
- Fix the "bpf" command display on Linux 4.17-rc1 and later kernels,
which contain two new program types, BPF_PROG_TYPE_RAW_TRACEPOINT
and BPF_PROG_TYPE_CGROUP_SOCK_ADDR. Without the patch, the dynamic
header string created for bpf programs overran into the bpf map
header, creating one long combined header string.
(anderson@xxxxxxxxxx)
- Updates for the presumption that system call names begin with "sys_".
In Linux 4.17, x86_64 system calls may begin with "__x64_sys", where,
for example, "sys_read" has been replaced by "__x64_sys_read".
(anderson@xxxxxxxxxx)
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility
--
Alex Sidorenko Expert Technologist
ERT Linux HPE Pointnext
asid@xxxxxxx +1 514-941-8030 Mobile
2344 Boulevard Alfred Nobel, Saint-Laurent, QC, Canada
--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility