Re: Figure out what killed an app (rhbz#2253099)

Yanko Kaneti <yaneti@xxxxxxxxxxx> · Fri, 02 Feb 2024 18:52:22 +0200

On Thu, 2024-02-01 at 09:44 +0100, Ondrej Mosnáček wrote:
> On Thu, 1 Feb 2024 at 09:13, Milan Crha <mcrha@xxxxxxxxxx> wrote:
> > The kernel tracing log for sig==9 shows:
> > 
> > gnome-terminal--2924    [002] dN.2.  2520.462889: signal_generate:
> > sig=9 errno=0 code=128 comm=alloc-too-much pid=3502 grp=1 res=0
> > 
> > There is no such thing (apart of the tracing log) when Evolution is
> > suddenly killed, the logs are muted. That makes me believe it's not the
> > OOM killer whom kills the application. I'm back at square 1; or maybe
> > square 2, as I know possibly kernel sends it, but not why.
> 
> Try running `echo stacktrace >/sys/kernel/tracing/trace_options` (as
> root) and then collect the kernel trace again. That should give you a
> backtrace of kernel functions from the signal generation, which could
> help you/us to figure out the reason the process was killed.

So, figured the easiest way to help trigger the kill here is to put load
on the machine. 

 $ stress-ng --cpu -1 --cpu-method all -t 5m --cpu-load 95

then starting evolution seems to do it almost every time shortly after
start (I have around 200k messages in the folder its trying to display)

I've enabled the stacktrace tracing option and like Milan set a sig==9
filter. And here is what I got in the trace buffer after it was killed

# tracer: nop
#
# entries-in-buffer/entries-written: 34/34   #P:16
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
       evolution-9096    [002] d..1.  1207.016489: signal_generate: sig=9 errno=0 code=128 comm=evolution pid=9096 grp=1 res=0
       evolution-9096    [002] d..1.  1207.016495: <stack trace>
 => trace_event_raw_event_signal_generate
 => __send_signal_locked
 => posix_cpu_timers_work
 => task_work_run
 => irqentry_exit_to_user_mode
 => asm_sysvec_apic_timer_interrupt
       evolution-9096    [002] d..2.  1207.016564: signal_generate: sig=9 errno=0 code=0 comm=bwrap pid=9145 grp=1 res=0
       evolution-9096    [002] d..2.  1207.016568: <stack trace>
 => trace_event_raw_event_signal_generate
 => __send_signal_locked
 => do_send_sig_info
 => do_exit
 => do_group_exit
 => get_signal
 => arch_do_signal_or_restart
 => irqentry_exit_to_user_mode
 => asm_sysvec_apic_timer_interrupt
...
and 32 other events of bwrap cleaning up.

Does that shed any light? Other that is seems to be sending the signal
to itself.

-Yanko
--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue