Re: [PATCH] crash: Do not use bt -t flag in panic_search()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----

> > > 
> > > With the zgetdump tool we create live dumps from /dev/mem or /dev/crash.
> > > These dumps get the LIVE_DUMP flag indicating that data is not
> > > consistent.
> > > 
> > > Besides of this, we have two other non-disruptive live dump features:
> > > 
> > >   - VMDUMP for z/VM guests
> > >   - Virsh dump for KVM guests
> > > 
> > > In contrast to the zgetdump method here the guest system is stopped
> > > to get consistent snapshots. Therefore I think it is fine to *not* set
> > > the LIVE_DUMP flag.
> > > 
> > > Besides of those live dump mechanisms (and kdump) we have our stand-alone dump
> > > tools for DASD and SCSI. Also these dump methods are "Linux independent" and
> > > therefore can produce dumps without panic tasks.
> > > 
> > > You can read more on s390 dump in the documents below:
> > > 
> > >  * http://www.vm.ibm.com/education/lvc/LVC1219.pdf
> > >  * http://www-01.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_dt.html?cp=linuxonibm%2F0-4-0-1
> > > 
> > > Michael
> > 
> > OK, so from what I understand, there still can be s390x dumpfiles which have no indication
> > of the panic task or cpu (if there is one) in their headers, and therefore may try the "bt -r"
> > type search of the active tasks via raw_stack_dump() in get_active_set_panic_task(),
> > and if that fails, fall back to the "bt -t" search of all tasks in panic_search().
> > 
> > In those cases, I suppose you could:
> > 
> >  (1) restrict the raw_stack_dump() parameters in
> >  get_active_set_panic_task() to exclude
> >      the user register dump at the top of the stack, and
> >  (2) plug in a MACHDEP_BT_TEXT handler for the s390x instead of using the generic version,
> >      and in that case, could prevent the search from entering the user-space register dump
> >      at the top of the stack, or
> > (2a) replace "bt -t" with just "bt" in panic_search() for s390x as you did in the original
> >      patch.
> > 
> > But (1) and (2) are not fool-proof, because even the kernel-only part of the stack could
> > simply contain "numbers" that by dumb luck fall into the zero-based virtual address
> > range of panic, crash_kexec, etc., and return a false positive.  So I don't know
> > how that can be made absolutely reliable.
> 
> I still would prefer 2a. See patch below.

OK, that's fine with me.

> 
> > 
> > But at least with dumpfiles that have the live dump magic number (and I'm still
> > not clear which of the 4 types do so),
> 
> Only the zgetdump live dump gets the live dump magic number.

OK, thanks for the clarification -- I'll update the changelog to indicate that.

Queued for crash-7.1.3:

  https://github.com/crash-utility/crash/commit/3c2fc5f2a027fe192327101cdc6db0e24a4794d9

Thanks,
  Dave




> > the simple LIVE_PATCH-check patch covers
> > them.  I'm not sure whether it's worth doing anything beyond that.
> ---
> crash: Do not use bt -t flag in panic_search()
> 
> On s390 we got a dump where a process "gmain" was incorrectly marked as
> running panic task:
> 
> crash> ps | grep gmain
> >   217      1   5      8bec23420     IN   0.0  463276  18240  gmain
> 
> The reason was that the "brute force" way parsing the "bt -t -o"
> output in panic_search() found the symbol "panic" on the stack:
> 
> crash> bt -t -o 8bec23420
> PID: 217    TASK: 8bec23420         CPU: 5   COMMAND: "gmain"
>               START: __schedule at 83f650
>   [       8b662b900] (null) at 0
>   [       8b662b978] __schedule at 83f650
> ...
>   [       8b662bb18] (null) at 0
>   [       8b662bb40] panic at 83679a  <<<<<--------------
> 
> The real stack trace was as follows:
> 
> crash> bt  8bec23420
> Detaching after fork from child process 15508.
> PID: 217    TASK: 8bec23420         CPU: 5   COMMAND: "gmain"
>  #0 [8b662b8f0] __schedule at 83f650
>  #1 [8b662b958] schedule at 83fade
>  #2 [8b662b970] schedule_hrtimeout_range_clock at 842fc8
>  #3 [8b662ba10] poll_schedule_timeout at 2c6e8a
>  #4 [8b662ba30] do_sys_poll at 2c8604
>  #5 [8b662be40] sys_poll at 2c8852
>  #6 [8b662bea8] system_call at 843a66
> 
> The value 0x83679a (panic at 83679a) was a local variable on the stack
> and was interpreted incorrectly as function call to "panic".
> 
> Especially for s390 there are dump methods, e.g. VMDUMP or stand-alone dump,
> where the "bt -t -o" method will be used to find the panic task. Therefore
> and because the "-t" method is quite risky, we use the "normal" stack
> backtrace without the "-t" bt option for s390.
> 
> Signed-off-by: Michael Holzheu <holzheu@xxxxxxxxxxxxxxxxxx>
> ---
>  task.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> --- a/task.c
> +++ b/task.c
> @@ -6633,7 +6633,11 @@ panic_search(void)
>          fd = &foreach_data;
>  	fd->keys = 1;
>  	fd->keyword_array[0] = FOREACH_BT;
> +#ifdef S390X
> +	fd->flags |= FOREACH_o_FLAG;
> +#else
>  	fd->flags |= (FOREACH_t_FLAG|FOREACH_o_FLAG);
> +#endif
>  
>  	dietask = lasttask = NO_TASK;
>  	
> 

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility



[Index of Archives]     [Fedora Development]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]

 

Powered by Linux