On Tue, Feb 25, 2025 at 6:40 PM Yosry Ahmed <yosry.ahmed@xxxxxxxxx> wrote: > > On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote: > > Currently, we crash the kernel when a decompression failure occurs in > > zswap (either because of memory corruption, or a bug in the compression > > algorithm). This is overkill. We should only SIGBUS the unfortunate > > process asking for the zswap entry on zswap load, and skip the corrupted > > entry in zswap writeback. > > It's probably worth adding more details here. For example, how the > SIGBUS is delivered (it's not super clear in the code), and the > distinction between handling decompression failures for loads and > writeback. Will do :) > > > > > See [1] for a recent upstream discussion about this. > > > > [1]: https://lore.kernel.org/all/ZsiLElTykamcYZ6J@xxxxxxxxxxxxxxxxxxxx/ > > > > Suggested-by: Matthew Wilcox <willy@xxxxxxxxxxxxx> > > Suggested-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > > It's Yosry Ahmed <yosry.ahmed@xxxxxxxxx> now :P Oops, will fix :P > > > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx> > > --- > > mm/zswap.c | 85 +++++++++++++++++++++++++++++++++++++----------------- > > 1 file changed, 58 insertions(+), 27 deletions(-) > > > > diff --git a/mm/zswap.c b/mm/zswap.c > > index f6316b66fb23..31d4397eed61 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -62,6 +62,8 @@ static u64 zswap_reject_reclaim_fail; > > static u64 zswap_reject_compress_fail; > > /* Compressed page was too big for the allocator to (optimally) store */ > > static u64 zswap_reject_compress_poor; > > +/* Load and writeback failed due to decompression failure */ > > Nit: Load or writeback? Consider it done :) > > > +static u64 zswap_reject_decompress_fail; > > /* Store failed because underlying allocator could not get memory */ > > static u64 zswap_reject_alloc_fail; > > /* Store failed because the entry metadata could not be allocated (rare) */ > > @@ -953,11 +955,12 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, > > return comp_ret == 0 && alloc_ret == 0; > > } > > > > -static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) > > +static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) > > { > > struct zpool *zpool = entry->pool->zpool; > > struct scatterlist input, output; > > struct crypto_acomp_ctx *acomp_ctx; > > + bool ret = true; > > u8 *src; > > > > acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx); > > @@ -984,12 +987,19 @@ static void zswap_decompress(struct zswap_entry *entry, struct folio *folio) > > sg_init_table(&output, 1); > > sg_set_folio(&output, folio, PAGE_SIZE, 0); > > acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, PAGE_SIZE); > > - BUG_ON(crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait)); > > - BUG_ON(acomp_ctx->req->dlen != PAGE_SIZE); > > + if (crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait) || > > + acomp_ctx->req->dlen != PAGE_SIZE) { > > + ret = false; > > + zswap_reject_decompress_fail++; > > + pr_alert_ratelimited( > > + "decompression failed on zswap entry with offset %08lx\n", > > + entry->swpentry.val); > > + } > > This can probably be prettier if we save the return value of > crypto_wait_req() in a variable, and then do the check at the end of the > function: > > zswap_decompress() > { > mutex_lock(); > ... > ret = crypto_wait_req(..); > ... > mutex_unlock(); > ... > if (ret || acomp_ctx->req->dlen != PAGE_SIZE) { Hmmmm, do we need the mutex to protect acomp_ctx->req->dlen? No strong opinions here from my end TBH. > /* SHIT */ > return false; > } > return true; > } [...] > > We are doing load + erase here and in the writeback path now, so two > xarray walks instead of one. How does this affect performance? We had a > similar about the possiblity of doing a lockless xas_load() followed by > a conditional xas_erase() for zswap_invalidate(): > > https://lore.kernel.org/lkml/20241018192525.95862-1-ryncsn@xxxxxxxxx/ > > Unfortunately it seems like we can't trivially do that unless we keep > the tree locked, which we probably don't want to do throughout > decompression. > > How crazy would it be to remove the entry from the tree and re-add it if > compression fails? Does swapcache_prepare() provide sufficient > protection for us to do this without anyone else looking at this entry > (seems like it)? My concern is, what happens if xa_store() in the re-add path fails because we do not have enough memory? > > Anyway, this is all moot if the second walk is not noticeable from a > perf perspective.