On Tue, Jul 25, 2023 at 07:45:25AM +1000, Dave Chinner wrote: > On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote: > > Hi again, > > > > We had another example of xarray corruption involving xfs and zsmalloc. We are > > running zram as swap. We have 2 tasks deadlock waiting for page to be released > > Do your problems on 6.1 go away if you stop using zram as swap? I think zram is the victim here, not the culprit. I think what's going on is that -- somehow -- there are stale pointers in the xarray. zram allocates these pages (I suspect most of the memory in this machine is allocated to zram or page cache) and then we blow up when finding a folio in the page cache which has a ->mapping that is actually a movable_ops structure. But how do we get stale pointers in the xarray? I've been worrying at that problem for months. At some point, the refcount must go down to zero: static inline void folio_put(struct folio *folio) { if (folio_put_testzero(folio)) __folio_put(folio); } (assume we're talking about a large folio; everything seems to point that way): __folio_put_large: if (!folio_test_hugetlb(folio)) __page_cache_release(folio); destroy_large_folio(folio); destroy_large_folio: free_transhuge_page() free_transhuge_page: free_compound_page(page); free_compound_page: free_the_page(page, compound_order(page)); free_the_page: __free_pages_ok(page, order, FPI_NONE); __free_pages_ok: if (!free_pages_prepare(page, order, fpi_flags)) free_pages_prepare: if (PageMappingFlags(page)) page->mapping = NULL; (doesn't trigger; PageMappingFlags are false for page cache) if (is_check_pages_enabled()) { if (free_page_is_bad(page)) free_page_is_bad: if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE))) return false; /* Something has gone sideways, find it */ free_page_is_bad_report(page); page_expected_state: if (unlikely((unsigned long)page->mapping | ... return false; free_page_is_bad_report: bad_page(page, page_bad_reason(page, PAGE_FLAGS_CHECK_AT_FREE)); page_bad_reason: if (unlikely(page->mapping != NULL)) bad_reason = "non-NULL mapping"; So (assuming that Daniel has check_pages_enabled set and isn't ignoring important parts of dmesg, which seem like reasonable assumptions), the last put of a folio must be after the folio has had its ->mapping cleared But we remove the folio from the page cache in page_cache_delete(), right before we set the mapping to NULL. And again in delete_from_page_cache_batch() (in the other order; I don't think that's relevant?) So where do we set folio->mapping to NULL without removing folio from the XArray? I'm beginning to suspect it's a mishandled failure in split_huge_page(), so I'll re-review that code path tomorrow.