On Fri, Aug 23, 2024 at 1:24 PM Piotr Oniszczuk <piotr.oniszczuk@xxxxxxxxx> wrote: > > > > > Wiadomość napisana przez Nhat Pham <nphamcs@xxxxxxxxx> w dniu 23.08.2024, o godz. 18:16: > > > > Have you tried with 6.9 yet? IIRC, there are two major changes to > > zswap architecture in recent versions. > > No. But now building vanilla 6.9.12. Will install and see… > (This will take some time as catching issue needs days of compilation) > > > > > 1. In 6.9, we range-partition zswap's rbtrees to reduce lock contention. > > > > 2. In 6.10, we replace zswap's rbtrees with xarrays. > > > > If 6.9 is fine, then the latter is the suspect, and vice versa. Of > > course, the minor changes are still suspect - but you get the idea :) > > > >> > >> btw: we can go with elimination strategy. > >> So what i need to change/disable to be closer to finding root cause? > > > > Could you let me know more about the setup? A couple things come to my mind: > > > > 1. zswap configs (allocator - is it zsmalloc? compressor?) > > Well - I’m not using zswap. But the bug happens in zswap path? :) Could you do: grep . /sys/module/zswap/parameters/* > > [root@minimyth2-aarch64-next piotro]# swapon -s > Filename Type Size Used Priority > /dev/nvme0n1p3 partition 16776188 294164 -2 > > > > > 2. Is mTHP enabled? mTHP swapout was merged in 6.10, and there seems > > I don’t have used config at the moment, but /sys/kernel/mm/transparent_hugepage in I see: > > │/hugepages-1024kB > │/hugepages-128kB > │/hugepages-16kB > │/hugepages-2048kB > │/hugepages-256kB > │/hugepages-32kB > │/hugepages-512kB > │/hugepages-64kB > > > > to be some conflicts with zswap, but Yosry will know more about this > > than me... > > > > 3. Is there any proprietary driver etc.? > > > > Only 2, both ryzen9 monitoring related: > https://github.com/leogx9r/ryzen_smu/commits/master > https://github.com/ocerman/zenpower/commits/master > The reason I asked this is because I've seen proprietary error screwing with memory in the past - it was an NVIDIA one though. https://lore.kernel.org/linux-mm/CAKbZUD1-kqfuV0U+KDKPkQbm=RwzD_A1H3qk_c+bw92CqtMbuw@xxxxxxxxxxxxxx/ Also decompression step failure (albeit in the writeback path)