On Fri, Mar 08, 2024 at 06:09:25PM +0000, Ryan Roberts wrote: > I think the world is trying to tell me "its Friday night. Stop". I can no longer > reproduce the non-NULL mapping oops that I was able to hit reliably this morning. HEISENBUG! > I do have this one though: > > [ 197.332914] Unable to handle kernel NULL pointer dereference at virtual > address 0000000000000000 > [ 197.340790] pc : deferred_split_scan+0x210/0x260 > [ 197.341154] lr : deferred_split_scan+0x70/0x260 > [ 197.347534] Call trace: > [ 197.347729] deferred_split_scan+0x210/0x260 > [ 197.348069] do_shrink_slab+0x184/0x750 > > > deferred_split_scan+0x210/0x260 is the code that I added back: > > if (!folio_try_get(folio)) { > /* We lost race with folio_put() */ > list_del_init(&folio->_deferred_list); <<<< HERE > ds_queue->split_queue_len--; > continue; > } > > We have the spinlock here so that really should not be happening. So does that > mean the list is being manipulated outside of the lock somewhere? Or maybe its > mapping (actually one of the deferred_list pointers being cleared by the buddy? > I dunno... give up. Will resume on Monday. Have a good weekend. This is actually congruent with a new theory I have which is that somewhere/somehow we're freeing the page without taking it off the deferred list. I don't see such a path, but if it does exist, we could absolutely corrupt the deferred_list in this way. Just working on a patch to make my detection patch reliable ... You have a good weekend too!