On Thu, Mar 06, 2025 at 12:50:10PM -0800, Nhat Pham wrote: > Currently, we crash the kernel when a decompression failure occurs in > zswap (either because of memory corruption, or a bug in the compression > algorithm). This is overkill. We should only SIGBUS the unfortunate > process asking for the zswap entry on zswap load, and skip the corrupted > entry in zswap writeback. > > See [1] for a recent upstream discussion about this. > > The zswap writeback case is relatively straightforward to fix. For the > zswap_load() case, we change the return behavior: > > * Return 0 on success. > * Return -ENOENT (with the folio locked) if zswap does not own the > swapped out content. > * Return -EIO if zswap owns the swapped out content, but encounters a > decompression failure for some reasons. The folio will be unlocked, > but not be marked up-to-date, which will eventually cause the process > requesting the page to SIGBUS (see the handling of not-up-to-date > folio in do_swap_page() in mm/memory.c), without crashing the kernel. > * Return -EINVAL if we encounter a large folio, as large folio should > not be swapped in while zswap is being used. Similar to the -EIO case, > we also unlock the folio but do not mark it as up-to-date to SIGBUS > the faulting process. > > As a side effect, we require one extra zswap tree traversal in the load > and writeback paths. Quick benchmarking on a kernel build test shows no > performance difference: > > With the new scheme: > real: mean: 125.1s, stdev: 0.12s > user: mean: 3265.23s, stdev: 9.62s > sys: mean: 2156.41s, stdev: 13.98s > > The old scheme: > real: mean: 125.78s, stdev: 0.45s > user: mean: 3287.18s, stdev: 5.95s > sys: mean: 2177.08s, stdev: 26.52s > > [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@xxxxxxxxxxxxxxxxxxxx/ > > Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > Suggested-by: Yosry Ahmed <yosry.ahmed@xxxxxxxxx> > Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx> > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>