Hello Dave,
I got a kernel freeze yesterday and am able to successfully open the memory image using crash utility.
crash> sys
KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
DUMPFILE: gt-Server02-gmt-612746ca.vmss
CPUS: 70
DATE: Wed Feb 21 14:53:20 2018
UPTIME: 1 days, 11:52:25
LOAD AVERAGE: 70.70, 30.98, 12.88
TASKS: 2312
NODENAME: gt-Server02-gmt.com
RELEASE: 4.14.19-coreos
VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
MACHINE: x86_64 (2094 Mhz)
MEMORY: 60 GB
PANIC: ""
crash>
Could you please guide me about couple of things I should check in case of a kernel freeze before diving in deep to find the root cause ?
Thank you,
Eshak
On Wed, Feb 7, 2018 at 7:12 PM, Eshak <tmdeshak@xxxxxxxxx> wrote:
Thank you for the quick info Dave.I'll deploy the main node with 'nokaslr' boot option and wait for a VM freeze.-EshakOn Wed, Feb 7, 2018 at 6:45 PM, anderson <anderson@xxxxxxxxxxxx> wrote:--Sent from my Verizon, Samsung Galaxy smartphone-------- Original message --------From: Eshak <tmdeshak@xxxxxxxxx>Date: 2/7/18 9:34 PM (GMT-05:00)To: "Discussion list for crash utility usage, maintenance and development" <crash-utility@xxxxxxxxxx>Subject: Re: linux_banner has garbageHi Dave,In a test system I have booted the kernel with 'nokaslr' option. While trying to check phys_base and KASLR:
crash> help -m |grep phys_base
phys_base: 0
text hit rate: 66% (5171 of 7801)
crash> help -k | grep relocate
relocate: 0 (KASLR offset: 0 / 0MB)
text hit rate: 66% (5171 of 7801)
crash>
I'm not sure if phys_base can be 0.Question: Are these values fine in order to read memory images by specifying --phys_base=0 after booting main machine with 'nokaslr' option ?Yes, but since phys_base defaults to 0,the --machdep argument wouldn't be necessary.DaveThank you,EshakOn Wed, Feb 7, 2018 at 10:49 AM, Dave Anderson <anderson@xxxxxxxxxx> wrote:
----- Original Message -----
> Hi Dave,
>
> Thanks for the info.
> I've installed 7.2.0-1.fc28 and was able to run crash on live system.
>
> Unfortunately, KASLR is enabled.
Yes, I'm afraid that is unfortunate. I don't know how you can determine
what the KASLR offset is, and without that, the dumpfile is pretty
much useless.
The best thing you can do is to prepare for the *next* crash by stashing
the phys_offset and KASLR offset values. You also can boot the kernel with
"nokaslr" on the boot command line.
Dave
&
>
>
> text hit rate: 66% (5171 of 7801)
>
> help -m |grep phys_base
>
> phys_base: 10d000000
>
> text hit rate: 66% (5171 of 7801)
>
> help -k | grep relocate
>
> relocate: ffffffffe1000000 (KASLR offset: 1f000000 / 496MB)
>
> text hit rate: 66% (5171 of 7801)
> Is there any other info I can get from the vmem/vmss file like processes
> running at the time or task blocked on I/O or anything ?
>
> Thank you,
> Eshak
>
> On Wed, Feb 7, 2018 at 6:28 AM, Dave Anderson < anderson@xxxxxxxxxx > wrote:
>
>
>
>
> ----- Original Message -----
> > That's fixed upstream. You'll have to download the crash sources from
> > github
> > and build the latest and greatest.
>
> It's possible that you might be able to run the Fedora 28 rawhide version
> here:
>
> Information for build crash-7.2.0-1.fc28
> https://koji.fedoraproject.org/koji/buildinfo?buildID=978501
>
> That version has the fix for the init_level4_pgt issue. I'm not sure
> whether you may run into anything else.
>
> Dave
>
>
> >
> >
> >
> >
> > Sent from my Verizon, Samsung Galaxy smartphone
> >
> > -------- Original message --------
> > From: Eshak < tmdeshak@xxxxxxxxx >
> > Date: 2/6/18 9:27 PM (GMT-05:00)
> > To: "Discussion list for crash utility usage, maintenance and development"
> > < crash-utility@xxxxxxxxxx >
> > Subject: Re: linux_banner has garbage
> >
> > Hi Dave,
> >
> > I have /proc/kcore. But I'm getting 'cannot resolve 'init_level4_pgt'
> > error.
> >
> >
> >
> > [root@gt-Server2-gmt proc]# crash
> > /home/mfusion/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/ 4.14.11-coreos/vmlinux
> > /proc/kcore
> >
> >
> >
> >
> > crash 7.1.9-3.fc27
> >
> > Copyright (C) 2002-2016 Red Hat, Inc.
> >
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> >
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> >
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> >
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> >
> > Copyright (C) 2005, 2011 NEC Corporation
> >
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> >
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> >
> > This program is free software, covered by the GNU General Public License,
> >
> > and you are welcome to change it and/or distribute copies of it under
> >
> > certain conditions. Enter "help copying" to see the conditions.
> >
> > This program has absolutely no warranty. Enter "help warranty" for details.
> >
> >
> >
> > crash: /dev/tty: No such device or address
> >
> > NOTE: stdin: not a tty
> >
> >
> >
> >
> > GNU gdb (GDB) 7.6
> >
> > Copyright (C) 2013 Free Software Foundation, Inc.
> >
> > License GPLv3+: GNU GPL version 3 or later <
> > http://gnu.org/licenses/gpl.html
> > >
> >
> > This is free software: you are free to change and redistribute it.
> >
> > There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> >
> > and "show warranty" for details.
> >
> > This GDB was configured as "x86_64-unknown-linux-gnu"...
> >
> >
> >
> >
> > WARNING: kernel relocated [496MB]: patching 69420 gdb minimal_symbol values
> >
> >
> >
> >
> > crash: cannot resolve "init_level4_pgt"
> >
> >
> >
> >
> > [root@gt-Server2-gmt proc]#
> > But I believe this is fixed in crash 7.2. I have raised one issue against
> > CoreOS to make crash 7.2 to be available in toolbox packages(
> > https://github.com/coreos/bugs/issues/2347 ).
> >
> > Meanwhile, Is there any workaround for this ?
> >
> > -Eshak
> >
> > On Tue, Feb 6, 2018 at 6:02 PM, anderson < anderson@xxxxxxxxxxxx > wrote:
> >
> >
> >
> >
> >
> > To run live, you need either /dev/mem, /proc/kcore, or the /dev/crash
> > driver.
> > You could try "crash vmlinux /proc/kcore" to see if it's available. If not,
> > you could try building the /dev/crash driver module. But I don't know if
> > CoreOS offers a kernel-devel package that you could build the driver
> > against? The driver source comes with the crash source package in the
> > memory_driver subdirectory.
> >
> > Dave
> >
> >
> > Sent from my Verizon, Samsung Galaxy smartphone
> >
> > -------- Original message --------
> > From: Eshak < tmdeshak@xxxxxxxxx >
> > Date: 2/6/18 8:35 PM (GMT-05:00)
> > To: "Discussion list for crash utility usage, maintenance and development"
> > <
> > crash-utility@xxxxxxxxxx >
> > Cc: hfu < hfu@xxxxxxxxxx >
> > Subject: Re: linux_banner has garbage
> >
> > Hi Dave,
> >
> > When trying to run crash live, I'm getting an error saying that /dev/mem is
> > not available.
> > I'm running crash from toolbox in a CoreOS VM. Is crash designed to run
> > from
> > a container ?
> >
> >
> >
> >
> >
> > [root@gt-Server2-gmt ~]# crash -d8
> > /home/user/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/4.1 4.11-coreos/vmlinux
> >
> >
> >
> >
> > crash 7.1.9-3.fc27
> >
> > Copyright (C) 2002-2016 Red Hat, Inc.
> >
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> >
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> >
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> >
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> >
> > Copyright (C) 2005, 2011 NEC Corporation
> >
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> >
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> >
> > This program is free software, covered by the GNU General Public License,
> >
> > and you are welcome to change it and/or distribute copies of it under
> >
> > certain conditions. Enter "help copying" to see the conditions.
> >
> > This program has absolutely no warranty. Enter "help warranty" for details.
> >
> >
> >
> > get_live_memory_source: /dev/mem
> >
> >
> >
> >
> > crash: /dev/mem: No such file or directory
> >
> >
> >
> >
> > [root@gt-Server2-gmt ~]#
> >
> > Thank you,
> > Eshak
> >
> > On Tue, Feb 6, 2018 at 3:05 PM, Eshak < tmdeshak@xxxxxxxxx > wrote:
> >
> >
> >
> > Thanks for the info Dave.
> > Unfortunately, I cannot run crash live on the machine because the VM is in
> > hung state right now. After resetting the VM(by tomorrow), will check for
> > KASLR and phys_base and try the suggested option.
> >
> > The complete output of crash is below:
> >
> >
> > [root@gt-Server2-gmt user]# crash -d8
> > /home/mfusion/vmem_vmss_jan26/usr/lib/debug/usr/lib/modules/ 4.14.11-coreos/vmlinux
> > /home/mfusion/vmem_vmss_jan26/usr/lib/modules/4.14.11-coreos /build/System.map
> > /home/mfusion/vmem_vmss_jan26/gt-Server2-gmt-612746ca.vmss
> >
> > crash 7.1.9-3.fc27
> > Copyright (C) 2002-2016 Red Hat, Inc.
> > Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
> > Copyright (C) 1999-2006 Hewlett-Packard Co
> > Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
> > Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
> > Copyright (C) 2005, 2011 NEC Corporation
> > Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
> > Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
> > This program is free software, covered by the GNU General Public License,
> > and you are welcome to change it and/or distribute copies of it under
> > certain conditions. Enter "help copying" to see the conditions.
> > This program has absolutely no warranty. Enter "help warranty" for details.
> >
> > crash: diskdump / compressed kdump: dump does not have panic dump header
> > crash: sadump: read dump device as media format
> > crash: sadump: does not have partition header
> > vmw: Header: id=bed2bed2 version=8 numgroups=95
> > vmw: Checkpoint is 64-bit
> > vmw: Group: Checkpoint offset=0x1dbc size=0x0x3ab.
> > vmw: Group: GuestVars offset=0x2167 size=0x0xa3.
> > vmw: Group: cpuid offset=0x220a size=0x0x5e0e.
> > vmw: Group: cpu offset=0x8018 size=0x0x615bb.
> > vmw: Group: BusMemSample offset=0x695d3 size=0x0x1c.
> > vmw: Group: UUIDVMX offset=0x695ef size=0x0x2e.
> > vmw: Group: StateLogger offset=0x6961d size=0x0x2.
> > vmw: Group: memory offset=0x6961f size=0x0xa8.
> > vmw: Item align_mask[0][0] => position=0x69633 size=0x4: 0000FFFF
> > vmw: Item regionsCount => position=0x69645 size=0x4: 00000002
> > vmw: Item regionPageNum[0] => position=0x6965c size=0x4: 00000000
> > vmw: Item regionPPN[0] => position=0x6966f size=0x4: 00000000
> > vmw: Item regionSize[0] => position=0x69683 size=0x4: 000C0000
> > vmw: Item regionPageNum[1] => position=0x6969a size=0x4: 000C0000
> > vmw: Item regionPPN[1] => position=0x696ad size=0x4: 00100000
> > vmw: Item regionSize[1] => position=0x696c1 size=0x4: 00E40000
> > vmw: Group: MStats offset=0x696c7 size=0x0x1936.
> > vmw: Group: Snapshot offset=0x6affd size=0x0x4b9c.
> > vmw: Group: pic offset=0x6fb99 size=0x0x511.
> > vmw: Group: FTCpt offset=0x700aa size=0x0x2.
> > vmw: Group: ide1:0 offset=0x700ac size=0x0x16e.
> > vmw: Group: scsi0:0 offset=0x7021a size=0x0x46.
> > vmw: Group: Migrate offset=0x70260 size=0x0x2.
> > vmw: Group: TimeTracker offset=0x70262 size=0x0x99.
> > vmw: Group: Backdoor offset=0x702fb size=0x0x2e.
> > vmw: Group: PCI offset=0x70329 size=0x0x13.
> > vmw: Group: Cs440bx offset=0x7033c size=0x0x40539.
> > vmw: Group: ExtCfgDevice offset=0xb0875 size=0x0x30.
> > vmw: Group: Floppy offset=0xb08a5 size=0x0x918c.
> > vmw: Group: AcpiNotify offset=0xb9a31 size=0x0x1b.
> > vmw: Group: vcpuHotPlug offset=0xb9a4c size=0x0xf5.
> > vmw: Group: devHP offset=0xb9b41 size=0x0x86.
> > vmw: Group: ACPIWake offset=0xb9bc7 size=0x0x1b.
> > vmw: Group: DevicesPowerOn offset=0xb9be2 size=0x0x2.
> > vmw: Group: PCIBridge0 offset=0xb9be4 size=0x0x272.
> > vmw: Group: PCIBridge4 offset=0xb9e56 size=0x0x48e.
> > vmw: Group: pciBridge4:1 offset=0xba2e4 size=0x0x48e.
> > vmw: Group: pciBridge4:2 offset=0xba772 size=0x0x48e.
> > vmw: Group: pciBridge4:3 offset=0xbac00 size=0x0x48e.
> > vmw: Group: pciBridge4:4 offset=0xbb08e size=0x0x48e.
> > vmw: Group: pciBridge4:5 offset=0xbb51c size=0x0x48e.
> > vmw: Group: pciBridge4:6 offset=0xbb9aa size=0x0x48e.
> > vmw: Group: pciBridge4:7 offset=0xbbe38 size=0x0x48e.
> > vmw: Group: PCIBridge5 offset=0xbc2c6 size=0x0x48e.
> > vmw: Group: pciBridge5:1 offset=0xbc754 size=0x0x48e.
> > vmw: Group: pciBridge5:2 offset=0xbcbe2 size=0x0x48e.
> > vmw: Group: pciBridge5:3 offset=0xbd070 size=0x0x48e.
> > vmw: Group: pciBridge5:4 offset=0xbd4fe size=0x0x48e.
> > vmw: Group: pciBridge5:5 offset=0xbd98c size=0x0x48e.
> > vmw: Group: pciBridge5:6 offset=0xbde1a size=0x0x48e.
> > vmw: Group: pciBridge5:7 offset=0xbe2a8 size=0x0x48e.
> > vmw: Group: PCIBridge6 offset=0xbe736 size=0x0x48e.
> > vmw: Group: pciBridge6:1 offset=0xbebc4 size=0x0x48e.
> > vmw: Group: pciBridge6:2 offset=0xbf052 size=0x0x48e.
> > vmw: Group: pciBridge6:3 offset=0xbf4e0 size=0x0x48e.
> > vmw: Group: pciBridge6:4 offset=0xbf96e size=0x0x48e.
> > vmw: Group: pciBridge6:5 offset=0xbfdfc size=0x0x48e.
> > vmw: Group: pciBridge6:6 offset=0xc028a size=0x0x48e.
> > vmw: Group: pciBridge6:7 offset=0xc0718 size=0x0x48e.
> > vmw: Group: PCIBridge7 offset=0xc0ba6 size=0x0x48e.
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility
-- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility