Re: kswapd0: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0 (Kernel v6.5.9, 32bit ppc)

Yu Zhao <yuzhao@xxxxxxxxxx> · Thu, 6 Jun 2024 11:55:00 -0600

On Thu, Jun 6, 2024 at 11:42 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Thu, Jun 6, 2024 at 10:14 AM Takero Funaki <flintglass@xxxxxxxxx> wrote:
> >
> > 2024年6月6日(木) 8:42 Yosry Ahmed <yosryahmed@xxxxxxxxxx>:
> >
> > > I think there are multiple ways to go forward here:
> > > (a) Make the number of zpools a config option, leave the default as
> > > 32, but allow special use cases to set it to 1 or similar. This is
> > > probably not preferable because it is not clear to users how to set
> > > it, but the idea is that no one will have to set it except special use
> > > cases such as Erhard's (who will want to set it to 1 in this case).
> > >
> > > (b) Make the number of zpools scale linearly with the number of CPUs.
> > > Maybe something like nr_cpus/4 or nr_cpus/8. The problem with this
> > > approach is that with a large number of CPUs, too many zpools will
> > > start having diminishing returns. Fragmentation will keep increasing,
> > > while the scalability/concurrency gains will diminish.
> > >
> > > (c) Make the number of zpools scale logarithmically with the number of
> > > CPUs. Maybe something like 4log2(nr_cpus). This will keep the number
> > > of zpools from increasing too much and close to the status quo. The
> > > problem is that at a small number of CPUs (e.g. 2), 4log2(nr_cpus)
> > > will actually give a nr_zpools > nr_cpus. So we will need to come up
> > > with a more fancy magic equation (e.g. 4log2(nr_cpus/4)).
> > >
> >
> > I just posted a patch to limit the number of zpools, with some
> > theoretical background explained in the code comments. I believe that
> > 2 * CPU linearly is sufficient to reduce contention, but the scale can
> > be reduced further. All CPUs are trying to allocate/free zswap is
> > unlikely to happen.
> >  How many concurrent accesses were the original 32 zpools supposed to
> > handle? I think it was for 16 cpu or more. or nr_cpus/4 would be
> > enough?
>
> We use 32 zpools on machines with 100s of CPUs. Two zpools per CPU is
> an overkill imo.

Not to choose a camp; just a friendly note on why I strongly disagree
with the N zpools per CPU approach:
1. It is fundamentally flawed to assume the system is linear;
2. Nonlinear systems usually have diminishing returns.

For Google data centers, using nr_cpus as the scaling factor had long
passed the acceptable ROI threshold. Per-CPU data, especially when
compounded per memcg or even per process, is probably the number-one
overhead in terms of DRAM efficiency.

> I have further comments that I will leave on the patch, but I mainly
> think this should be driven by real data, not theoretical possibility
> of lock contention.