Re: [PATCH 1/2] mm: use slab size in the slab shrinking ratio calculation

Minchan Kim <minchan@xxxxxxxxxx> · Wed, 5 Jul 2017 13:43:00 +0900

On Tue, Jul 04, 2017 at 09:21:37AM -0400, Josef Bacik wrote:
> On Tue, Jul 04, 2017 at 12:01:00PM +0900, Minchan Kim wrote:
> > On Mon, Jul 03, 2017 at 09:50:07AM -0400, Josef Bacik wrote:
> > > On Mon, Jul 03, 2017 at 10:33:03AM +0900, Minchan Kim wrote:
> > > > Hello,
> > > > 
> > > > On Fri, Jun 30, 2017 at 11:03:24AM -0400, Josef Bacik wrote:
> > > > > On Fri, Jun 30, 2017 at 11:17:13AM +0900, Minchan Kim wrote:
> > > > > 
> > > > > <snip>
> > > > > 
> > > > > > > 
> > > > > > > Because this static step down wastes cycles.  Why loop 10 times when you could
> > > > > > > set the target at actual usage and try to get everything in one go?  Most
> > > > > > > shrinkable slabs adhere to this default of in use first model, which means that
> > > > > > > we have to hit an object in the lru twice before it is freed.  So in order to
> > > > > > 
> > > > > > I didn't know that.
> > > > > > 
> > > > > > > reclaim anything we have to scan a slab cache's entire lru at least once before
> > > > > > > any reclaim starts happening.  If we're doing this static step down thing we
> > > > > > 
> > > > > > If it's really true, I think that shrinker should be fixed first.
> > > > > > 
> > > > > 
> > > > > Easier said than done.  I've fixed this for the super shrinkers, but like I said
> > > > > below, all it takes is some asshole doing find / -exec stat {} \; twice to put
> > > > > us back in the same situation again.  There's no aging mechanism other than
> > > > > memory reclaim, so we get into this shitty situation of aging+reclaiming at the
> > > > > same time.
> > > > 
> > > > What's different with normal page cache problem?
> > > > 
> > > 
> > > What's different is reclaiming a page from pagecache gives you a page,
> > > reclaiming 10k objects from slab may only give you one page if you are super
> > > unlucky.  I'm nothing in life if I'm not unlucky.
> > > 
> > > > It has the same problem you mentioned so need to peek what VM does to
> > > > address it.
> > > > 
> > > > It has two LRU list, active and inactive and maintain the size ratio 1:1.
> > > > New page is on inactive and if they are two-touched, the page will be
> > > > promoted into active list which is same problem.
> > > > However, once reclaim is triggered, VM will can move quickly them from
> > > > active to inactive with remove referenced flag untill the ratio is matched.
> > > > So, VM can reclaim pages from inactive list, easily.
> > > > 
> > > > Can we apply similar mechanism into the problematical slab?
> > > > How about adding shrink_slab(xxx, ACTIVE|INACTIVE) in somewhere of VM
> > > > for demotion of objects from active list and adding the logic to move
> > > > inactive object to active list when the cache hit happens to the FS?
> > > > 
> > > 
> > > I did this too!  This worked out ok, but was a bit complex and the problem was
> > > solved just as well by dropping the INUSE first approach.  I think that Dave's
> > > approach to having a separate aging mechanism is a good compliment to these
> > > patches.
> > 
> > There are two problems you are try to address.
> > 
> > 1. slab *page* reclaim
> > 
> > Your claim is that it's hard to reclaim a page by slab fragmentation so need to
> > reclaim objects more aggressively.
> > 
> > Basically, aggressive scanning doesn't guarantee to reclaim a page but it just
> > increases the possibility. Even, if we think slab works with merging feature(i.e.,
> > it mixes same size several type objects in a slab), the possibility will be huge
> > dropped if you try to bail out on a certain shrinker. So for working well,
> > we should increase aggressiveness too much to sweep every objects from all shrinker.
> > I guess that's why your patch makes the logic very aggressive.
> > In here, my concern with that aggressive is to reclaim all objects too early
> > and it ends up making void caching scheme. I'm not sure it's gain in the end.
> >
> 
> Well the fact is what we have doesn't work, and I've been staring at this
> problem for a few months and I don't have a better solution.
> 
> And keep in mind we're talking about a purely slab workload, something that
> isn't likely to be a common case.  And even if our scan target is 2x, we aren't
> going to reclaim the entire cache before we bail out.  We only scan in
> 'batch_size' chunks, which generally is 1024.  In the worst case that we have

I replied your new patchset. It breaks fair aging.

> one in use object on every slab page we own then yes we're fucked, but we're
> still fucked with the current code, only with the current code it'll take us 20
> minutes of looping in the vm vs. seconds scanning the whole list twice.

Don't get me wrong. As I replied at first discussion, I tend to agree
increase aggressiveness. What I dislike is your patch increases it too much
via SLAB/LRU ratio. It can reclaim all of objects although memory pressure
is not severe but small LRU/big SLAB.

You said we can bail out but it breaks aging problem between shrinkers
as reply of new patch.

As well, gslab/lru ratio is totally out of control from VM pov.
IMO, VM want to define hot workingset with survival pages although it
has twice full scan(1/4096 + 1/2048 + ....  1/1) so if VM cannot see
any progress in that full scan until MAX_RECLAIM_RETRIES(16), it decide
to kill some process although there are reclaimable pages in there.

Like that, VM need some guide line to guarantee. However, if we use
SLAB/LRU ratio, it can scan all of objects twice from the beginning.
It means it can 24 full scan in reclaim iteration. Also, it can be
changed with SLAB/LRU ratio so it's really unpredictable.

> 
> > 2. stream-workload
> > 
> > Your claim is that every objects can have INUSE flag in that workload so they
> > need to scan full-cycle with removing the flag and finally, next cycle,
> > objects can be reclaimed. On the situation, static incremental scanning would
> > make deep prorioty drop which causes unncessary CPU cycle waste.
> > 
> > Actually, there isn't nice solution for that at the moment. Page cache try
> > to solve it with multi-level LRU and as you said, it would solve the
> > problem. However, it would be too complicated so you could be okay with
> > Dave's suggestion which periodic aging(i.e., LFU) but it's not free so that
> > it could increase runtime latency.
> > 
> > The point is that such workload is hard to solve in general and just
> > general agreessive scanning is not a good solution because it can sweep
> > other shrinkers which don't have such problem so I hope it should be
> > solved by a specific shrinker itself rather than general VM level.
> 
> The only problem I see here is our shrinker list is just a list, there's no
> order or anything and we just walk through one at a time.  We could mitigate

I don't get it. Why do you think ordered list solves this stream-workload
issues?

> this problem by ordering the list based on objects, but this isn't necessarily a
> good indication of overall size.  Consider xfs_buf, where each slab object is
> also hiding 1 page, so for every slab object we free we also free 1 page.  This
> may appear to be a smaller slab by object measures, but may actually be larger.
> We could definitely make this aspect of the shrinker smarter, but these patches
> here need to still be in place in general to solve the problem of us not being
> aggressive enough currently.  Thanks,
> 
> Josef

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>