On Thu, Jun 6, 2024 at 11:42 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Thu, Jun 6, 2024 at 10:14 AM Takero Funaki <flintglass@xxxxxxxxx> wrote: > > > > 2024年6月6日(木) 8:42 Yosry Ahmed <yosryahmed@xxxxxxxxxx>: > > > > > I think there are multiple ways to go forward here: > > > (a) Make the number of zpools a config option, leave the default as > > > 32, but allow special use cases to set it to 1 or similar. This is > > > probably not preferable because it is not clear to users how to set > > > it, but the idea is that no one will have to set it except special use > > > cases such as Erhard's (who will want to set it to 1 in this case). > > > > > > (b) Make the number of zpools scale linearly with the number of CPUs. > > > Maybe something like nr_cpus/4 or nr_cpus/8. The problem with this > > > approach is that with a large number of CPUs, too many zpools will > > > start having diminishing returns. Fragmentation will keep increasing, > > > while the scalability/concurrency gains will diminish. > > > > > > (c) Make the number of zpools scale logarithmically with the number of > > > CPUs. Maybe something like 4log2(nr_cpus). This will keep the number > > > of zpools from increasing too much and close to the status quo. The > > > problem is that at a small number of CPUs (e.g. 2), 4log2(nr_cpus) > > > will actually give a nr_zpools > nr_cpus. So we will need to come up > > > with a more fancy magic equation (e.g. 4log2(nr_cpus/4)). > > > > > > > I just posted a patch to limit the number of zpools, with some > > theoretical background explained in the code comments. I believe that > > 2 * CPU linearly is sufficient to reduce contention, but the scale can > > be reduced further. All CPUs are trying to allocate/free zswap is > > unlikely to happen. > > How many concurrent accesses were the original 32 zpools supposed to > > handle? I think it was for 16 cpu or more. or nr_cpus/4 would be > > enough? > > We use 32 zpools on machines with 100s of CPUs. Two zpools per CPU is > an overkill imo. Not to choose a camp; just a friendly note on why I strongly disagree with the N zpools per CPU approach: 1. It is fundamentally flawed to assume the system is linear; 2. Nonlinear systems usually have diminishing returns. For Google data centers, using nr_cpus as the scaling factor had long passed the acceptable ROI threshold. Per-CPU data, especially when compounded per memcg or even per process, is probably the number-one overhead in terms of DRAM efficiency. > I have further comments that I will leave on the patch, but I mainly > think this should be driven by real data, not theoretical possibility > of lock contention.