On Mon, Apr 25, 2022 at 1:59 PM Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > * Yu Zhao <yuzhao@xxxxxxxxxx> [220425 14:06]: > > On Wed, Apr 20, 2022 at 7:43 AM Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > > > > > * Yu Zhao <yuzhao@xxxxxxxxxx> [220419 19:23]: > > > > On Tue, Apr 19, 2022 at 5:18 PM Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > > > > > > > > > * Yu Zhao <yuzhao@xxxxxxxxxx> [220419 17:59]: > > > > > > On Tue, Apr 19, 2022 at 9:51 AM Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > * Yu Zhao <yuzhao@xxxxxxxxxx> [220416 15:30]: > > > > > > > > On Sat, Apr 16, 2022 at 9:19 AM Liam Howlett <liam.howlett@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > <snipped> > > > > > > > > > > > > > > > > > How did you hit this issue? Just on boot? > > > > > > > > > > > > > > > > I was hoping this is known to you or you have something I can verify for you. > > > > > > > > > > > > > > > > > > > > > Thanks, yes. I believe that both crashes are the same root cause. The > > > > > > > cause is that I was not cleaning up after the kmem bulk allocation > > > > > > > failure on my side. Please test with this patch. > > > > > > > > > > > > Thanks. I applied this patch and hit a LOCKDEP and then a BUG_ON: > > > > > > > > > > > > lib/maple_tree.c:847 suspicious rcu_dereference_protected() usage! > > > > > > Call Trace: > > > > > > <TASK> > > > > > > dump_stack_lvl+0x6c/0x9a > > > > > > dump_stack+0x10/0x12 > > > > > > lockdep_rcu_suspicious+0x12c/0x140 > > > > > > __mt_destroy+0x96/0xd0 > > > > > > exit_mmap+0x2a0/0x360 > > > > > > __mmput+0x34/0x100 > > > > > > mmput+0x2f/0x40 > > > > > > free_bprm+0x64/0xe0 > > > > > > kernel_execve+0x129/0x330 > > > > > > call_usermodehelper_exec_async+0xd8/0x130 > > > > > > ? proc_cap_handler+0x210/0x210 > > > > > > ret_from_fork+0x1f/0x30 > > > > > > </TASK> > > > > > > > > > > Thanks - I'm not sure how this got through, but this should fix it. > > > > > > > > > > This should be added to 4236a642ad185 to avoid the LOCKDEP issue. > > > > > > > > > > --- a/mm/mmap.c > > > > > +++ b/mm/mmap.c > > > > > @@ -3163,9 +3163,9 @@ void exit_mmap(struct mm_struct *mm) > > > > > > > > > > BUG_ON(count != mm->map_count); > > > > > > > > > > - mmap_write_unlock(mm); > > > > > trace_exit_mmap(mm); > > > > > __mt_destroy(&mm->mm_mt); > > > > > + mmap_write_unlock(mm); > > > > > vm_unacct_memory(nr_accounted); > > > > > } > > > > > > > > Will try this. > > > > > > > > > Andrew, > > > > > > Please add this fix to the commit 4236a642ad185 "mm: start tracking VMAs > > > with maple tree" > > > > > > I've attached the patch for your convenience. > > > > Hi Liam, > > > > I assume you are still looking at the BUG_ON problem. I'll restart my > > testing once you have something for me to try. > > > > Thanks. > > No, The above fix stopped the suspicious rcu dereference. I've found > another issue in the mlock() code which I've also fixed.. but I needed > to change my allocations from within the immap rwsem lock as it triggers > a potential lockdep issue on high memory usage - lockdep complains about > fs-reclaim lock. I've a patch set that works but I'm working through > making it bisectable. I think the easiest thing is to integrate these > fixes and the others sent to Andrew into a v8. I hope to have this done > by the end of the day tomorrow. No worries. Just wanted to make sure I didn't miss anything from you.