[PATCH] ARC: prevent showing irrelevant exception info in signal message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Vineet,

On Thu, 2018-07-05 at 14:26 -0700, Vineet Gupta wrote:
> On 07/03/2018 03:57 AM, Eugeniy Paltsev wrote:
> > On Mon, 2018-07-02 at 10:57 -0700, Vineet Gupta wrote:
> > > +CC Al
> > > 
> > > On 06/29/2018 12:39 PM, Eugeniy Paltsev wrote:
> > > > We process signals in the end of syscall/exception handler.
> > > > It the signal is fatal we print register's content using
> > > > show_regs function. show_regs() also prints information about
> > > > last exception happened.
> > > > 
> > > > In case of multicore system we can catch the situation when we
> > > > will print wrong information about exception. See the example:
> > > > ______________________________
> > > > CPU-0: started to handle page fault
> > > > CPU-1: sent signal to process, which is executed on CPU-0
> > > > CPU-0: ended page fault handle. Started to process signal before
> > > >        returnig to userspace. Process signal, which is send from
> > > >        CPU-0. As th signal is fatal we call show_regs().
> > > >        show_regs() will show information about last exception
> > > >        which is *page fault* (instead of "trap" which is used for
> > > >        signals and happened on CPU-0)
> > > > 
> > > > So we will get message like this:
> > > >     /home/waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   Path: /home/waitpid02
> > > >   CPU: 0 PID: 100 Comm: waitpid02 Not tainted 4.10.0-rc4 #2
> > > >   task: 9f11c200 task.stack: 9f3ae000
> > > > 
> > > >   [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000123ec
> > > >   [EFA   ]: 0x00000000
> > > >   [BLINK ]: 0x123ea
> > > >   [ERET  ]: 0x123ec
> > > >     @off 0x123ec in [/home/waitpid02]
> > > >     VMA: 0x00010000 to 0x00016000
> > > >   [STAT32]: 0x80080882 : IE U
> > > >   BTA: 0x000123ea  SP: 0x5ffd3db0  FP: 0x00000000
> > > >   LPS: 0x20031684 LPE: 0x2003169a LPC: 0x00000006
> > > >   [-----other-info-----]
> > > > 
> > > > This message is confusing because it show information about page fault
> > > > ( [ECR   ]: 0x00050200 => Invalid Write ) which is absolutely irrelevant
> > > > to signal.
> > > 
> > > Agreed this is misleading. @Al, is there a way to identify process termination
> > > from signals because it did something wrong vs. say unhandled signal. For former,
> > > we want to dump additional info in show_regs() such as PC / Fault addres etc and
> > > not in other scenario.
> > > 
> > > > This situation was reproduced with waitpid02 LTP test.
> > > > _____________________________
> > > > 
> > > > So remove printing information about exceptions from show_regs()
> > > > to avoid confusing messages. Print information about exceptions
> > > > only in required places instead of show_regs()
> > > > 
> > > > Now we don't print information about exceptions if signal is simply
> > > > send by another userspace app. So in case of waitpid02 we will print
> > > > next message:
> > > > _____________________________
> > > >     ./waitpid02
> > > >   potentially unexpected fatal signal 8.
> > > >   [STAT32]: 0x80080082 : IE U
> > > >   BTA: 0x20000fc4	 SP: 0x5ff8bd64	 FP: 0x00000000
> > > >   LPS: 0x200524a0	LPE: 0x200524b6	LPC: 0x00000006
> > > >   [-----other-info-----]
> > > > _____________________________
> > > 
> > > The prints I'm seeing now, for a segv from NULL pointer access is even more
> > > confusing !
> > > There's a mixup of prints....
> > > 
> > > -------------------->8--------------------
> > > Path: /segv
> > > CPU: 0 PID: 70 Comm: segv Not tainted 4.17.0+ #412
> > > 
> > > [ECR   ]: 0x00050200 => Invalid Write @ 0x00000000 by insn @ 0x000103ac
> > > [EFA   ]: 0x00000000
> > > [BLINK ]: 0x20047bb0
> > > [ERET  ]: 0x103ac
> > >     @off 0x103ac in [/segv]
> > >     VMA: 0x00010000 to 0x00012000
> > > 
> > > potentially unexpected fatal signal 11.
> > > [STAT32]: 0x80080882 : IE U    
> > > BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
> > > LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> > > r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000   
> > > r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f   
> > > ...
> > > -------------------->8--------------------
> > > 
> > > and for the process killed by signal 8, we get below.
> > > 
> > > -------------------->8--------------------
> > > [ARCLinux]# kill -8 71
> > > [ARCLinux]# potentially unexpected fatal signal 8.
> > > [STAT32]: 0x80080882 : IE U    
> > > BTA: 0x20020660     SP: 0x5fbcddec     FP: 0x5fbcde1c
> > > LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
> > > r00: 0xfffffdfc    r01: 0x5fbcddf0    r02: 0x00000000   
> > > r03: 0x00000008    r04: 0x80808080    r05: 0x2f2f2f2f   
> > > r06: 0x7a2f5f4a    r07: 0x00000000    r08: 0x00000065   
> > > ...
> > > 
> > > 
> > > [1]+  Floating point exception   ./sleep
> > > -------------------->8--------------------
> > > I'm not sure whats the improvement here vs. the status quo.
> > 
> > Why do you think this is confusing?
> > The main change is that we don't print exception registers for signal based kill.
> 
> For the pure signal based termination, what is the point of printing the rest of
> registers. If you say "it is to give a feel of what the userspace was doing at the
> time...." then we are lacking the most crucial piece which is the PC at the time
> (i.e. ERET placeholder).

We can disable printing the rest of registers by setting "print-fatal-signals" to 0
by default. In that case we will print rest of registers only for exceptions
happened in kernel space (when die() function is called)

So in case of setting "print-fatal-signals" to 0 we will get following messages:

NULL pointer access from user space:
---------->8-------------
# ./arc_hell &
# Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10

[ECR   ]: 0x00050100 => Invalid Read @ 0x00000000 by insn @ 0x2003a35c
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
    @off 0x2e35c in [/lib/libuClibc-1.0.18.so]
    VMA: 0x2000c000 to 0x20072000

[1]+  Segmentation fault         ./arc_hell
---------->8-------------

Process killed by signal 11 via 'kill -s 11':
---------->8-------------
# sleep 1000 &
[1]+  Segmentation fault         sleep 1000
---------->8-------------



Probably the best variant is to print only information about unexpected exception
from exception handler and leave printing ECR/EFA/BLINK/ERET in show_regs().

So in case of exception based signal we will print from exception handler
something like that:
---------->8-------------
Unexpected exception: Invalid Read @ 0x00000000 by insn @ 0x2003a35c
---------->8-------------

Other information will be print in show_regs (only if "print-fatal-signals"
is enabled):
---------->8-------------
potentially unexpected fatal signal 11.
Path: /root/arc_hell
CPU: 0 PID: 79 Comm: arc_hell Not tainted 4.17.0+ #10
[ECR   ]: 0x00050100
[EFA   ]: 0x00000000
[BLINK ]: 0x20039ef8
[ERET  ]: 0x2003a35c
    @off 0x2e35c in [/lib/libuClibc-1.0.18.so]
    VMA: 0x2000c000 to 0x20072000
[STAT32]: 0x80080882 : IE U
BTA: 0x00010398     SP: 0x5fc95e1c     FP: 0x5fc95e20
LPS: 0x20039ffc    LPE: 0x2003a000    LPC: 0x00000000
r00: 0x00000001    r01: 0x5fc95e94    r02: 0x00000000
r03: 0x00000064    r04: 0x80808080    r05: 0x2f2f2f2f
...
---------->8-------------


> > Moreover, new behavior is more like x86-64 behavior. See the example:
> 
> For a null pointer based termination, we now have a ugly looking "potentially
> unexpected..." print in the middle of reg file dump, that is not what x86 does.
> Anyways, that print It is undesirable, but not a deal breaker.  The issue is point
> above, can we remedy it.
> 
> BTW in your original patch, for a null pointer access, the printing code now
> allocates 2 pages, in each of show_xxx routines and the one in show_regs() is now
> pointless, as it is not used there at all there - so please fix that as well.

Thanks for pointing, will fix in v2 patch.

> -Vineet
> 
-- 
 Eugeniy Paltsev


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux