On Tue, Sep 12, 2023 at 12:28:11PM -0400, Zi Yan wrote: > From: Zi Yan <ziy@xxxxxxxxxx> > > Feel free to give comments and ask questions. How about testing? I'm looking with an eye towards creating a pathalogical situation which can be automated for fragmentation and see how things go. Mel Gorman's original artificial fragmentation taken from his first patches ot help with fragmentation avoidance from 2018 suggested he tried [0]: ------ From 2018 a) Create an XFS filesystem b) Start 4 fio threads that write a number of 64K files inefficiently. Inefficiently means that files are created on first access and not created in advance (fio parameterr create_on_open=1) and fallocate is not used (fallocate=none). With multiple IO issuers this creates a mix of slab and page cache allocations over time. The total size of the files is 150% physical memory so that the slabs and page cache pages get mixed c) Warm up a number of fio read-only threads accessing the same files created in step 2. This part runs for the same length of time it took to create the files. It'll fault back in old data and further interleave slab and page cache allocations. As it's now low on memory due to step 2, fragmentation occurs as pageblocks get stolen. While step 3 is still running, start a process that tries to allocate 75% of memory as huge pages with a number of threads. The number of threads is based on a (NR_CPUS_SOCKET - NR_FIO_THREADS)/4 to avoid THP threads contending with fio, any other threads or forcing cross-NUMA scheduling. Note that the test has not been used on a machine with less than 8 cores. The benchmark records whether huge pages were allocated and what the fault latency was in microseconds d) Measure the number of events potentially causing external fragmentation, the fault latency and the huge page allocation success rate. ------- end of extract These days we can probably do a bit more damage. There has been concerns that LBS support (block size > ps) could hinder fragmentation, one of the reasons is that any file created despite it's size will require at least the block size, and if using 64k block size that means 64k allocation for each new file on that 64k block size filesystem, so clearly you may run out of lower order allocations pretty quickly. You can also create different larg eblock filesystems too, one for 64k another for 32k. Although LBS is new and we're still ironing out the kinks if you wanna give it a go we've rebased the patches onto Linus' tree [1], and if you wanted to ramp up fast you could use kdevops [2] which let's you pick that branch and also a series of NVMe drives (by enabling CONFIG_LIBVIRT_EXTRA_STORAGE_DRIVE_NVME) for large IO experimentation (by enabling CONFIG_VAGRANT_ENABLE_LARGEIO). Creating different filesystem with large block size (64k, 32k, 16k) on a 4k sector size drive (mkfs.xfs -f -b size=64k -s size=4k) should let you easily do tons of crazy pathalogical things. Are there other known recipes test help test this stuff? How do we measure success in your patches for fragmentation exactly? [0] https://lwn.net/Articles/770235/ [1] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=large-block-linus-nobdev [2] https://github.com/linux-kdevops/kdevops Luis