On Sun, Feb 16, 2020 at 12:15:54PM +0800, Wen Yang wrote: > > > On 2020/2/13 6:52 上午, Andrew Morton wrote: > > On Sat, 1 Feb 2020 11:15:02 +0800 Wen Yang <wenyang@xxxxxxxxxxxxxxxxx> wrote: > > > > > The lock, protecting the node partial list, is taken when couting the free > > > objects resident in that list. It introduces locking contention when the > > > page(s) is moved between CPU and node partial lists in allocation path > > > on another CPU. So reading "/proc/slabinfo" can possibily block the slab > > > allocation on another CPU for a while, 200ms in extreme cases. If the > > > slab object is to carry network packet, targeting the far-end disk array, > > > it causes block IO jitter issue. > > > > > > This fixes the block IO jitter issue by caching the total inuse objects in > > > the node in advance. The value is retrieved without taking the node partial > > > list lock on reading "/proc/slabinfo". > > > > > > ... > > > > > > @@ -1768,7 +1774,9 @@ static void free_slab(struct kmem_cache *s, struct page *page) > > > static void discard_slab(struct kmem_cache *s, struct page *page) > > > { > > > - dec_slabs_node(s, page_to_nid(page), page->objects); > > > + int inuse = page->objects; > > > + > > > + dec_slabs_node(s, page_to_nid(page), page->objects, inuse); > > > > Is this right? dec_slabs_node(..., page->objects, page->objects)? > > > > If no, we could simply pass the page* to inc_slabs_node/dec_slabs_node > > and save a function argument. > > > > If yes then why? > > > > Thanks for your comments. > We are happy to improve this patch based on your suggestions. > > > When the user reads /proc/slabinfo, in order to obtain the active_objs > information, the kernel traverses all slabs and executes the following code > snippet: > static unsigned long count_partial(struct kmem_cache_node *n, > int (*get_count)(struct page *)) > { > unsigned long flags; > unsigned long x = 0; > struct page *page; > > spin_lock_irqsave(&n->list_lock, flags); > list_for_each_entry(page, &n->partial, slab_list) > x += get_count(page); > spin_unlock_irqrestore(&n->list_lock, flags); > return x; > } > > It may cause performance issues. > > Christoph suggested "you could cache the value in the userspace application? > Why is this value read continually?", But reading the /proc/slabinfo is > initiated by the user program. As a cloud provider, we cannot control user > behavior. If a user program inadvertently executes cat /proc/slabinfo, it > may affect other user programs. > > As Christoph said: "The count is not needed for any operations. Just for the > slabinfo output. The value has no operational value for the allocator > itself. So why use extra logic to track it in potentially performance > critical paths?" > > In this way, could we show the approximate value of active_objs in the > /proc/slabinfo? > > Based on the following information: > In the discard_slab() function, page->inuse is equal to page->total_objects; > In the allocate_slab() function, page->inuse is also equal to > page->total_objects (with one exception: for kmem_cache_node, page-> inuse > equals 1); > page->inuse will only change continuously when the obj is constantly > allocated or released. (This should be the performance critical path > emphasized by Christoph) > > When users query the global slabinfo information, we may use total_objects > to approximate active_objs. Well, from one point of view, it makes no sense, because the ratio between these two numbers is very meaningful: it's the slab utilization rate. On the other side, with enabled per-cpu partial lists active_objs has nothing to do with the reality anyway, so I agree with you, calling count_partial() is almost useless. That said, I wonder if the right thing to do is something like the patch below? Thanks! Roman -- diff --git a/mm/slub.c b/mm/slub.c index 1d644143f93e..ba0505e75ecc 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2411,14 +2411,16 @@ static inline unsigned long node_nr_objs(struct kmem_cache_node *n) static unsigned long count_partial(struct kmem_cache_node *n, int (*get_count)(struct page *)) { - unsigned long flags; unsigned long x = 0; +#ifdef CONFIG_SLUB_CPU_PARTIAL + unsigned long flags; struct page *page; spin_lock_irqsave(&n->list_lock, flags); list_for_each_entry(page, &n->partial, slab_list) x += get_count(page); spin_unlock_irqrestore(&n->list_lock, flags); +#endif return x; } #endif /* CONFIG_SLUB_DEBUG || CONFIG_SYSFS */