On Sun, Jun 30, 2024 at 10:58 AM Pedro Falcato <pedro.falcato@xxxxxxxxx> wrote: > > Hi everyone, Hi Pedro, Thanks for the bug report! Taking a look now - some preliminary questions to narrow down the suspects and aid the debugging process: a) Do you observe this bug in 6.8? 6.10? b) Have you run the faddr2line script to verify that the line that triggers the crash is count_objcg_event(entry->objcg, ZSWPWB);? c) Do you have a full dmesg log? Or maybe some other reproduction instructions? If entry->objcg is garbage, then this smells like a lifetime/reference counting issue. Either: a) The zswap entry itself is garbage. Not impossible, but seems unlikely. In 6.9, we effectively isolate the entry first through the swap cache, then check and remove it from the zswap tree (under the tree's lock). The former locks out concurrent accessors, and the latter should have taken care of invalidated entries (and prevents future invalidation attempts). Furthermore, after this, if the entry is somehow garbage (i.e freed and recycled), it should also be possible to blow up in the decompression step first, by feeding a garbage handle to zsmalloc and crashing the kernel at that point. IOW, we should also see zsmalloc crashes in addition to this particular crash, no? I cannot think of any protection mechanism that applies to the decompression step and not to count_objcg_event(). b) entry->objcg has been freed/recycled under us. This is much trickier, as the culprit could be any holder of the objcg reference who accidentally double-released the reference it held. That said, if it only happened on zswap shrinker path, then maybe there is something to this... Let me muse on this a bit more. Please let us know if you have other clues, traces, hints, or observation - it will help the investigation a lot!