Re: [PATCH] crash: Do not use bt -t flag in panic_search()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> 
> On Thu, 6 Aug 2015 11:25:29 -0400 (EDT)
> Dave Anderson <anderson@xxxxxxxxxx> wrote:
> 
> > Re: your dumpfile where the erroneous "panic" address in a random user
> > task's exception frame register set gets picked up by mistake.
> > 
> > Your original patch request modified the "bt" command used for the
> > kernel stack searches in panic_search().  But that piece of code
> > is the last-ditch effort for finding a panic task, which follows
> > this path:
> > 
> >   get_panic_context()
> >     panic_search()
> >       get_dumpfile_panic_task()
> >         get_kdump_panic_task()       (requires kdump "crashing_cpu" symbol)
> >         get_diskdump_panic_task()    (requires kdump "crashing_cpu" symbol)
> 
> On s390 we don't have the "crashing_cpu" symbol in the kernel.
> 
> >         get_active_set_panic_task()  (bt -r raw stack dump of active cpus)
> >     ...
> >       
> > Only if all of the above fail, does panic_search() initiate the
> > exhaustive walkthrough of all kernel stacks for evidence.
> > 
> > Since you have gotten that far, I'm wondering whether your
> > target dumpfile with the faulty "panic" address is from an
> > s390x "live dump"?  In that case, there can never be any task
> > with any such evidence, making the backtrace search a waste of
> > time to begin with.
> 
> The "problem" dump is a s390 stand-alone dump of a hanging system.
> All CPUs have been in "psw_idle" when the dump was generated:
> 
> PID: 0      TASK: c50f38            CPU: 0   COMMAND: "swapper/0"
>  LOWCORE INFO:
>   -psw      : 0x0706c00180000000 0x000000000084410e
>   -function : psw_idle at 84410e
> 
> [snip]
> 
>  #0 [00c1fe70] arch_cpu_idle at 104d4a
>  #1 [00c1fe90] cpu_startup_entry at 180430
>  #2 [00c1fee8] start_kernel at d1fb10
>  #3 [00c1ff60] _stext at 100020
> 
> 
> > 
> > And if so, I'm thinking that since s390x will have set LIVE_DUMP
> > flag set, if get_dumpfile_panic_task() returns NO_TASK, then
> > panic_search() should just return a NULL to get_panic_context()
> > if it's a live dump, which will just default to the idle task on
> > cpu 0.
> 
> Although it does not solve the above problem it makes sense for
> live dumps. What about the following patch?
> ---
> crash: do not search panic tasks for live dumps
> 
> Always return "NO_TASK" if the "LIVE_DUMP" flag is set because live dumps
> cannot have a panic task.
> 
> Signed-off-by: Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>
> ---
>  task.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- a/task.c
> +++ b/task.c
> @@ -6726,7 +6726,10 @@ get_dumpfile_panic_task(void)
>  {
>  	ulong task;
>  
> -	if (NETDUMP_DUMPFILE()) {
> +	if (pc->flags2 & LIVE_DUMP) {
> +		/* No panic task because system itself created the dump */
> +		return NO_TASK;
> +	} else if (NETDUMP_DUMPFILE()) {
>  		task = pc->flags & REM_NETDUMP ?
>  			tt->panic_task : get_netdump_panic_task();
>  		if (task)
> 

That makes sense, but I'm going to move the LIVE_DUMP check farther down
in get_dumpfile_panic_task() to just before the get_active_set() call.

The reason for that another type of "LIVE_DUMP" is from the snap.so extension
module, and in that case, get_kdump_panic_task() finds and returns the "crash"
task that was running the snap command on the live system.  

Clarify something else for me: are there actually two types of live dumps
that can be taken by an s390x?  There is the "zgetdump" facility, but is
there also another type that is taken by the firmware and/or the hypervisor?

Dave


--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility



[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux