On Mon, Jul 24, 2023 at 10:45 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote: > > Hi again, > > > > We had another example of xarray corruption involving xfs and zsmalloc. We are > > running zram as swap. We have 2 tasks deadlock waiting for page to be released > > Do your problems on 6.1 go away if you stop using zram as swap? We had xarray corruptions even on nodes without swap, so I'm not sure if swap matters. The corruption on those nodes were noted in the first email with the following trace BUG: kernel NULL pointer dereference, address: 0000000000000036 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 18806c5067 P4D 18806c5067 PUD 188ed48067 PMD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 73 PID: 3579408 Comm: prometheus Tainted: G O 6.1.34-cloudflare-2023.6.7 #1 Hardware name: GIGABYTE R162-Z12-CD1/MZ12-HD4-CD, BIOS M03 11/19/2021 RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-arch-fallback.h:1242 include/linux/atomic/atomic-arch-fallback.h:1267 include/linux/atomic/atomic-instrumented.h:608 include/linux/page_ref.h:238 include/linux/page_ref.h:247 include/linux/page_ref.h:280 include/linux/page_ref.h:313 mm/filemap.c:1863 mm/filemap.c:1915) It's hard for us to run tests without zram swap at scale since the benefits are significant with a lot of workloads. Daniel.