Re: Figure out what killed an app (rhbz#2253099)

Ondrej Mosnáček <omosnacek@xxxxxxxxx> · Sat, 3 Feb 2024 13:29:40 +0100

On Fri, 2 Feb 2024 at 17:52, Yanko Kaneti <yaneti@xxxxxxxxxxx> wrote:
>
> On Thu, 2024-02-01 at 09:44 +0100, Ondrej Mosnáček wrote:
> > On Thu, 1 Feb 2024 at 09:13, Milan Crha <mcrha@xxxxxxxxxx> wrote:
> > > The kernel tracing log for sig==9 shows:
> > >
> > > gnome-terminal--2924    [002] dN.2.  2520.462889: signal_generate:
> > > sig=9 errno=0 code=128 comm=alloc-too-much pid=3502 grp=1 res=0
> > >
> > > There is no such thing (apart of the tracing log) when Evolution is
> > > suddenly killed, the logs are muted. That makes me believe it's not the
> > > OOM killer whom kills the application. I'm back at square 1; or maybe
> > > square 2, as I know possibly kernel sends it, but not why.
> >
> > Try running `echo stacktrace >/sys/kernel/tracing/trace_options` (as
> > root) and then collect the kernel trace again. That should give you a
> > backtrace of kernel functions from the signal generation, which could
> > help you/us to figure out the reason the process was killed.
>
> So, figured the easiest way to help trigger the kill here is to put load
> on the machine.
>
>  $ stress-ng --cpu -1 --cpu-method all -t 5m --cpu-load 95
>
> then starting evolution seems to do it almost every time shortly after
> start (I have around 200k messages in the folder its trying to display)
>
> I've enabled the stacktrace tracing option and like Milan set a sig==9
> filter. And here is what I got in the trace buffer after it was killed
>
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 34/34   #P:16
> #
> #                                _-----=> irqs-off/BH-disabled
> #                               / _----=> need-resched
> #                              | / _---=> hardirq/softirq
> #                              || / _--=> preempt-depth
> #                              ||| / _-=> migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>        evolution-9096    [002] d..1.  1207.016489: signal_generate: sig=9 errno=0 code=128 comm=evolution pid=9096 grp=1 res=0
>        evolution-9096    [002] d..1.  1207.016495: <stack trace>
>  => trace_event_raw_event_signal_generate
>  => __send_signal_locked
>  => posix_cpu_timers_work
>  => task_work_run
>  => irqentry_exit_to_user_mode
>  => asm_sysvec_apic_timer_interrupt

So, browsing through the relevant kernel code, it seems the only cases
which could have led to this backtrace are:
1. When a task's RT timeout goes over the RLIMIT_RTTIME hard limit
(see function check_thread_timers() in
kernel/time/posix-cpu-timers.c).
2. When a task's CPU time goes over the RLIMIT_CPU hard limit (see
function check_process_timers() in kernel/time/posix-cpu-timers.c).

I may have missed some code path, but these resource limits should be
the next thing to check.

>        evolution-9096    [002] d..2.  1207.016564: signal_generate: sig=9 errno=0 code=0 comm=bwrap pid=9145 grp=1 res=0
>        evolution-9096    [002] d..2.  1207.016568: <stack trace>
>  => trace_event_raw_event_signal_generate
>  => __send_signal_locked
>  => do_send_sig_info
>  => do_exit
>  => do_group_exit
>  => get_signal
>  => arch_do_signal_or_restart
>  => irqentry_exit_to_user_mode
>  => asm_sysvec_apic_timer_interrupt
> ...
> and 32 other events of bwrap cleaning up.
>
> Does that shed any light? Other that is seems to be sending the signal
> to itself.
>
> -Yanko
> --
> _______________________________________________
> devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
> To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
> Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
> Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue
--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue