On Tue, May 21, 2019 at 11:15:25AM +1000, Tobin C. Harding wrote: > On Tue, May 21, 2019 at 12:51:57AM +0000, Roman Gushchin wrote: > > On Mon, May 20, 2019 at 03:40:05PM +1000, Tobin C. Harding wrote: > > > Internal fragmentation can occur within pages used by the slub > > > allocator. Under some workloads large numbers of pages can be used by > > > partial slab pages. This under-utilisation is bad simply because it > > > wastes memory but also because if the system is under memory pressure > > > higher order allocations may become difficult to satisfy. If we can > > > defrag slab caches we can alleviate these problems. > > > > > > Implement Slab Movable Objects in order to defragment slab caches. > > > > > > Slab defragmentation may occur: > > > > > > 1. Unconditionally when __kmem_cache_shrink() is called on a slab cache > > > by the kernel calling kmem_cache_shrink(). > > > > > > 2. Unconditionally through the use of the slabinfo command. > > > > > > slabinfo <cache> -s > > > > > > 3. Conditionally via the use of kmem_cache_defrag() > > > > > > - Use Slab Movable Objects when shrinking cache. > > > > > > Currently when the kernel calls kmem_cache_shrink() we curate the > > > partial slabs list. If object migration is not enabled for the cache we > > > still do this, if however, SMO is enabled we attempt to move objects in > > > partially full slabs in order to defragment the cache. Shrink attempts > > > to move all objects in order to reduce the cache to a single partial > > > slab for each node. > > > > > > - Add conditional per node defrag via new function: > > > > > > kmem_defrag_slabs(int node). > > > > > > kmem_defrag_slabs() attempts to defragment all slab caches for > > > node. Defragmentation is done conditionally dependent on MAX_PARTIAL > > > _and_ defrag_used_ratio. > > > > > > Caches are only considered for defragmentation if the number of > > > partial slabs exceeds MAX_PARTIAL (per node). > > > > > > Also, defragmentation only occurs if the usage ratio of the slab is > > > lower than the configured percentage (sysfs field added in this > > > patch). Fragmentation ratios are measured by calculating the > > > percentage of objects in use compared to the total number of objects > > > that the slab page can accommodate. > > > > > > The scanning of slab caches is optimized because the defragmentable > > > slabs come first on the list. Thus we can terminate scans on the > > > first slab encountered that does not support defragmentation. > > > > > > kmem_defrag_slabs() takes a node parameter. This can either be -1 if > > > defragmentation should be performed on all nodes, or a node number. > > > > > > Defragmentation may be disabled by setting defrag ratio to 0 > > > > > > echo 0 > /sys/kernel/slab/<cache>/defrag_used_ratio > > > > > > - Add a defrag ratio sysfs field and set it to 30% by default. A limit > > > of 30% specifies that more than 3 out of 10 available slots for objects > > > need to be in use otherwise slab defragmentation will be attempted on > > > the remaining objects. > > > > > > In order for a cache to be defragmentable the cache must support object > > > migration (SMO). Enabling SMO for a cache is done via a call to the > > > recently added function: > > > > > > void kmem_cache_setup_mobility(struct kmem_cache *, > > > kmem_cache_isolate_func, > > > kmem_cache_migrate_func); > > > > > > Co-developed-by: Christoph Lameter <cl@xxxxxxxxx> > > > Signed-off-by: Tobin C. Harding <tobin@xxxxxxxxxx> > > > --- > > > Documentation/ABI/testing/sysfs-kernel-slab | 14 + > > > include/linux/slab.h | 1 + > > > include/linux/slub_def.h | 7 + > > > mm/slub.c | 385 ++++++++++++++++---- > > > 4 files changed, 334 insertions(+), 73 deletions(-) > > > > Hi Tobin! > > > > Overall looks very good to me! I'll take another look when you'll post > > a non-RFC version, but so far I can't find any issues. > > Thanks for the reviews. > > > A generic question: as I understand, you do support only root kmemcaches now. > > Is kmemcg support in plans? > > I know very little about cgroups, I have no plans for this work. > However, I'm not the architect behind this - Christoph is guiding the > direction on this one. Perhaps he will comment. > > > Without it the patchset isn't as attractive to anyone using cgroups, > > as it could be. Also, I hope it can solve (or mitigate) the memcg-specific > > problem of scattering vfs cache workingset over multiple generations of the > > same cgroup (their kmem_caches). > > I'm keen to work on anything that makes this more useful so I'll do some > research. Thanks for the idea. You're welcome! I'm happy to help or even to do it by myself, once your patches will be merged. Thanks!