On Wed, Nov 17, 2021 at 6:51 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > Kyle Huey <me@xxxxxxxxxxxx> writes: > > > On Mon, Nov 15, 2021 at 9:31 PM Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > >> > >> > >> Kyle Huey recently reported[1] that rr gets confused if SIGKILL prevents > >> ptrace_signal from delivering a signal, as the kernel setups up a signal > >> frame for a signal that rr did not have a chance to observe with ptrace. > >> > >> In looking into it I found a couple of bugs and a quality of > >> implementation issue. > >> > >> - The test for signal_group_exit should be inside the for loop in get_signal. > >> - Signals should be requeued on the same queue they were dequeued from. > >> - When a fatal signal is pending ptrace_signal should not return another > >> signal for delivery. > >> > >> Kyle Huey has verified[2] an earlier version of this change. > >> > >> I have reworked things one more time to completely fix the issues > >> raised, and to keep the code maintainable long term. > >> > >> I have smoke tested this code and combined with a careful review I > >> expect this code to work fine. Kyle if you can double check that > >> my last round of changes still works for rr I would appreciate it. > > > > This still fixes the race we reported. > > > > > Tested-by: Kyle Huey <khuey@xxxxxxxxxxxx> > > Thank you very much for retesting. > > Eric Thank you, Kyle and Eric, for reporting and fixing the root cause of this race. Meanwhile, I followed Kyle's suggestion and will disable the crash handlers in the tracee whenever it is being traced. Marko -- Marko Mäkelä, Lead Developer InnoDB MariaDB Corporation