On Monday, October 18th, 2021 at 6:25 AM, Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > On Sun, Oct 17, 2021 at 10:47 AM Rune Kleveland > > rune.kleveland@xxxxxxxxxxxx wrote: > > > Hi! > > > > After applying the below patch, the 5 most problematic servers have run > > > > without any issues for 23 hours. That never happened before the patch on > > > > 5.14, so the patch seems to have fixed the issue for me. > > Confirm. I couldn't reproduce the problem on 5.14 either. > I'm also unable to reproduce the crash as for now. Thx for the patch. Jordan > > On Monday there will be more load on the servers, which caused them to > > > > crash faster without the patch. I will let you know if it happens again. > > > > Best regards, > > > > Rune > > > > On 16/10/2021 00:10, Eric W. Biederman wrote: > > > > > In commit fda31c50292a ("signal: avoid double atomic counter > > > > > > increments for user accounting") Linus made a clever optimization to > > > > > > how rlimits and the struct user_struct. Unfortunately that > > > > > > optimization does not work in the obvious way when moved to nested > > > > > > rlimits. The problem is that the last decrement of the per user > > > > > > namespace per user sigpending counter might also be the last decrement > > > > > > of the sigpending counter in the parent user namespace as well. Which > > > > > > means that simply freeing the leaf ucount in __free_sigqueue is not > > > > > > enough. > > > > > > Maintain the optimization and handle the tricky cases by introducing > > > > > > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts. > > > > > > By moving the entire optimization into functions that perform all of > > > > > > the work it becomes possible to ensure that every level is handled > > > > > > properly. > > > > > > I wish we had a single user across all of the threads whose rlimit > > > > > > could be charged so we did not need this complexity.