Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

Nhat Pham <nphamcs@xxxxxxxxx> · Fri, 7 Jun 2024 10:40:52 +0100

On Thu, Jun 6, 2024 at 6:43 AM Sergey Senozhatsky
<senozhatsky@xxxxxxxxxxxx> wrote:
>
> On (24/06/06 12:46), Chengming Zhou wrote:
> > >> Agree, I think we should try to improve locking scalability of zsmalloc.
> > >> I have some thoughts to share, no code or test data yet:
> > >>
> > >> 1. First, we can change the pool global lock to per-class lock, which
> > >>    is more fine-grained.
> > >
> > > Commit c0547d0b6a4b6 "zsmalloc: consolidate zs_pool's migrate_lock
> > > and size_class's locks" [1] claimed no significant difference
> > > between class->lock and pool->lock.
> >
> > Ok, I haven't looked into the history much, that seems preparation of trying
> > to introduce reclaim in the zsmalloc? Not sure. But now with the reclaim code
> > in zsmalloc has gone, should we change back to the per-class lock? Which is
>
> Well, the point that commit made was that Nhat (and Johannes?) were
> unable to detect any impact of pool->lock on a variety of cases.  So
> we went on with code simplification.

Yeah, we benchmarked it before zsmalloc writeback was introduced (the
patch to remove class lock was a prep patch of the series). We weren't
able to detect any regression at the time with just using a global
pool lock.

>
> > obviously more fine-grained than the pool lock. Actually, I have just done it,
> > will test to get some data later.
>
> Thanks, we'll need data on this.  I'm happy to take the patch, but
> jumping back and forth between class->lock and pool->lock merely
> "for obvious reasons" is not what I'm extremely excited about.

FWIW, I do think it'd be nice if we can make the locking more granular
- the pool lock now is essentially a global lock, and we're just
getting around that by replicating the (z)pools themselves.

Personally, I'm not super convinced about class locks. We're
essentially relying on the post-compression size of the data to
load-balance the queries - I can imagine a scenario where a workload
has a concentrated distribution of post-compression data (i.e its
pages are compressed to similar-ish sizes), and we're once again
contending for a (few) lock(s) again.

That said, I'll let the data tell the story :) We don't need a perfect
solution, just a good enough solution for now.