On Tue, Jun 4, 2024 at 12:29 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Kairui Song <ryncsn@xxxxxxxxx> writes: > > > On Fri, May 31, 2024 at 10:37 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Isn't limiting order-0 allocation breaks the bottom line that order-0 > > allocation is the first class citizen, and should not fail if there is > > space? > > Sorry for confusing words. I mean limiting maximum number order-0 swap > entries allocation in workloads, instead of limiting that in kernel. What interface does it use to limit the order 0 swap entries? I was thinking the kernel would enforce the high order swap space reservation just like hugetlbfs does on huge pages. We will need to introduce some interface to specify the reservation. > > > Just my two cents... > > > > I had a try locally based on Chris's work, allowing order 0 to use > > nonfull_clusters as Ying has suggested, and starting with low order > > and increase the order until nonfull_cluster[order] is not empty, that > > way higher order is just better protected, because unless we ran out > > of free_cluster and nonfull_cluster, direct scan won't happen. > > > > More concretely, I applied the following changes, which didn't change > > the code much: > > - In scan_swap_map_try_ssd_cluster, check nonfull_cluster first, then > > free_clusters, then discard_cluster. > > - If it's order 0, also check for (int i = 0; i < SWAP_NR_ORDERS; ++i) > > nonfull_clusters[i] cluster before scan_swap_map_try_ssd_cluster > > returns false. > > > > A quick test still using the memtier test, but decreased the swap > > device size from 10G to 8g for higher pressure. > > > > Before: > > hugepages-32kB/stats/swpout:34013 > > hugepages-32kB/stats/swpout_fallback:266 > > hugepages-512kB/stats/swpout:0 > > hugepages-512kB/stats/swpout_fallback:77 > > hugepages-2048kB/stats/swpout:0 > > hugepages-2048kB/stats/swpout_fallback:1 > > hugepages-1024kB/stats/swpout:0 > > hugepages-1024kB/stats/swpout_fallback:0 > > hugepages-64kB/stats/swpout:35088 > > hugepages-64kB/stats/swpout_fallback:66 > > hugepages-16kB/stats/swpout:31848 > > hugepages-16kB/stats/swpout_fallback:402 > > hugepages-256kB/stats/swpout:390 > > hugepages-256kB/stats/swpout_fallback:7244 > > hugepages-128kB/stats/swpout:28573 > > hugepages-128kB/stats/swpout_fallback:474 > > > > After: > > hugepages-32kB/stats/swpout:31448 > > hugepages-32kB/stats/swpout_fallback:3354 > > hugepages-512kB/stats/swpout:30 > > hugepages-512kB/stats/swpout_fallback:33 > > hugepages-2048kB/stats/swpout:2 > > hugepages-2048kB/stats/swpout_fallback:0 > > hugepages-1024kB/stats/swpout:0 > > hugepages-1024kB/stats/swpout_fallback:0 > > hugepages-64kB/stats/swpout:31255 > > hugepages-64kB/stats/swpout_fallback:3112 > > hugepages-16kB/stats/swpout:29931 > > hugepages-16kB/stats/swpout_fallback:3397 > > hugepages-256kB/stats/swpout:5223 > > hugepages-256kB/stats/swpout_fallback:2351 > > hugepages-128kB/stats/swpout:25600 > > hugepages-128kB/stats/swpout_fallback:2194 > > > > High order (256k) swapout rate are significantly higher, 512k is now > > possible, which indicate high orders are better protected, lower > > orders are sacrificed but seems worth it. > > Yes. I think that this reflects another aspect of the problem. In some > situations, it's better to steal one high-order cluster and use it for > order-0 allocation instead of scattering order-0 allocation in random > high-order clusters. Agree, the scan loop on swap_map[] has the worst pollution. Chris