Re: linux_banner has garbage

Dave Anderson <anderson@xxxxxxxxxx> · Fri, 23 Feb 2018 09:41:17 -0500 (EST)

----- Original Message -----
> Thank you for the guidance Dave.
> 
> I have two questions regarding runq.
> 
> 1. Could you please let me know how the active task has spent more time than
> uptime on some CPUs?

I'm not sure, other than the uptime is calculated based upon jiffies, and
the runq -m option uses each run queue's per-cpu timestamp.

> 
> crash> runq -m
> CPU 0: [0 00:23:29.808] PID: 529 TASK: ffff88079d0d1e40 COMMAND: "kworker/u141:1"
> CPU 1: [1 12:10:42.840] PID: 0 TASK: ffff88079df48000 COMMAND: "swapper/1"
> CPU 2: [1 12:10:42.841] PID: 0 TASK: ffff88079df4bc80 COMMAND: "swapper/2"
> CPU 3: [1 12:10:42.841] PID: 0 TASK: ffff88079df4dac0 COMMAND: "swapper/3"
> CPU 4: [1 12:10:42.841] PID: 0 TASK: ffff88079df49e40 COMMAND: "swapper/4"
> CPU 5: [1 12:10:42.841] PID: 0 TASK: ffff88079df58000 COMMAND: "swapper/5"
> 
> crash> sys
> KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
> DUMPFILE: gt-user2-gmt-612746ca.vmss
> CPUS: 70
> DATE: Wed Feb 21 14:53:20 2018
> UPTIME: 1 days, 11:52:25
> LOAD AVERAGE: 70.70, 30.98, 12.88
> TASKS: 2312
> NODENAME: gt-user2-gmt.com
> RELEASE: 4.14.19-coreos
> VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
> MACHINE: x86_64 (2094 Mhz)
> MEMORY: 60 GB
> PANIC: ""
> crash>
>
> 2. Is there a way to:: find out why some CPUs have time lag in run queue ?

I don't know, but I would certainly look at the task backtraces of the cpus
that have the large lag values. 

> 
> CPU 32: 0.00 secs
> CPU 65: 0.00 secs
> CPU 54: 0.00 secs
> CPU 0: 0.01 secs
> CPU 16: 84.22 secs
> CPU 66: 268.75 secs
> CPU 58: 268.75 secs
> CPU 57: 268.75 secs
> CPU 43: 268.75 secs
> CPU 20: 268.75 secs
> CPU 7: 268.75 secs
> 
> crash>

> I'm struggling to find out why my VM hung(unresponsive to ping/ssh and couple
> of CPUs at 100% utilization).
> 
> -Eshak
> 
> On Thu, Feb 22, 2018 at 6:27 AM, Dave Anderson < anderson@xxxxxxxxxx > wrote:
> 
> 
> ----- Original Message -----
> > Hello Dave,
> > 
> > I got a kernel freeze yesterday and am able to successfully open the memory
> > image using crash utility.
> > 
> > crash> sys
> > KERNEL: ./usr/lib/debug/usr/lib/modules/4.14.19-coreos/vmlinux
> > DUMPFILE: gt-Server02-gmt-612746ca.vmss
> > CPUS: 70
> > DATE: Wed Feb 21 14:53:20 2018
> > UPTIME: 1 days, 11:52:25
> > LOAD AVERAGE: 70.70, 30.98, 12.88
> > TASKS: 2312
> > NODENAME: gt-Server02-gmt.com
> > RELEASE: 4.14.19-coreos
> > VERSION: #1 SMP Wed Feb 14 03:18:05 UTC 2018
> > MACHINE: x86_64 (2094 Mhz)
> > MEMORY: 60 GB
> > PANIC: ""
> > crash>
> > 
> > Could you please guide me about couple of things I should check in case of
> > a kernel freeze before diving in deep to find the root cause ?
> 
> I'm not sure what you mean by a "kernel freeze", but typically something
> would complain about a hard or soft lockup in the system log. So I would
> first run "log" to see if there's anything of interest. Run "bt -a" on
> the active tasks to see if the active tasks are contesting for something,
> or work your way through "foreach bt" to see what the tasks of interest are
> doing/waiting on. It would seem that some task has taken control of
> something,
> a lock, or counter, or whatever, and many other tasks have blocked waiting
> for its release. So there's probably a common theme among the blocked tasks
> that might give you a clue.
> 
> Dave
> 
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility
> 
> 
> --
> Crash-utility mailing list
> Crash-utility@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility