Re: [PATCH v2] z3fold: add shrinker

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 13 Oct 2016 11:20:06 +1100

On Wed, Oct 12, 2016 at 10:26:34AM +0200, Vitaly Wool wrote:
> On Wed, 12 Oct 2016 09:52:06 +1100
> Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> 
> <snip>
> > 
> > > +static unsigned long z3fold_shrink_scan(struct shrinker *shrink,
> > > +				struct shrink_control *sc)
> > > +{
> > > +	struct z3fold_pool *pool = container_of(shrink, struct z3fold_pool,
> > > +						shrinker);
> > > +	struct z3fold_header *zhdr;
> > > +	int i, nr_to_scan = sc->nr_to_scan;
> > > +
> > > +	spin_lock(&pool->lock);
> > 
> > Do not do this. Shrinkers should not run entirely under a spin lock
> > like this - it causes scheduling latency problems and when the
> > shrinker is run concurrently on different CPUs it will simply burn
> > CPU doing no useful work. Especially, in this case, as each call to
> > z3fold_compact_page() may be copying a significant amount of data
> > around and so there is potentially a /lot/ of work being done on
> > each call to the shrinker.
> > 
> > If you need compaction exclusion for the shrinker invocation, then
> > please use a sleeping lock to protect the compaction work.
> 
> Well, as far as I recall, spin_lock() will resolve to a sleeping lock
> for PREEMPT_RT,

Irrelevant for mainline kernels....

> so it is not that much of a problem for configurations
> which do care much about latencies.

That's an incorrect assumption. Long spinlock holds prevent
scheduling on that CPU, and so we still get latency problems.

> Please also note that the time
> spent in the loop is deterministic since we take not more than one entry
> from every unbuddied list.

So the loop is:

	for_each_unbuddied_list_down(i, NCHUNKS - 3) {

NCHUNKS = (PAGE_SIZE - (1 << (PAGE_SHIFT - 6)) >> (PAGE_SHIFT - 6)

So for 4k page, NCHUNKS = (4096 - (1<<6)) >> 6, which is 63. So,
potentially 60 memmoves under a single spinlock on a 4k page
machine. That's a lot of work, especially as some of those memmoves
are going to move a large amount of data in the page.

And if we consider 64k pages, we've now got NCHUNKS = 1023, which
means your shrinker is not, by default, going to scan all your
unbuddied lists because it will expire nr_to_scan (usually
SHRINK_BATCH = 128) before it's got through all of them. So not only
will the shrinker do too much under a spinlock, it won't even do
what you want it to do correctly on such setups.

Further, the way nr_to_scan is decremented and the shrinker return
value are incorrect. nr_to_scan is not the /number of objects to
free/, but the number of objects to /check for reclaim/. The
shrinker is then supposed to return the number it frees (or
compacts) to give feedback to the shrinker infrastructure about how
much reclaim work is being done (i.e. scanned vs freed ratio). This
code always returns 0, which tells the shrinker infrastructure that
it's not making progress...

> What I could do though is add the following piece of code at the end of
> the loop, right after the /break/:
> 		spin_unlock(&pool->lock);
> 		cond_resched();
> 		spin_lock(&pool->lock);
> 
> Would that make sense for you?

Not really, because it ignores the fact that shrinkers can (and
often do) run concurrently on multiple CPUs, and so serialising them
all on a spinlock just causes contention, even if you do this.

Memory reclaim is only as good as the worst shrinker it runs. I
don't care what your subsystem does, but if you're implementing a
shrinker then it needs to play by memory reclaim and shrinker
context rules.....

> > >  *****************/
> > > @@ -234,6 +335,13 @@ static struct z3fold_pool *z3fold_create_pool(gfp_t gfp,
> > >  		INIT_LIST_HEAD(&pool->unbuddied[i]);
> > >  	INIT_LIST_HEAD(&pool->buddied);
> > >  	INIT_LIST_HEAD(&pool->lru);
> > > +	pool->shrinker.count_objects = z3fold_shrink_count;
> > > +	pool->shrinker.scan_objects = z3fold_shrink_scan;
> > > +	pool->shrinker.seeks = DEFAULT_SEEKS;
> > > +	if (register_shrinker(&pool->shrinker)) {
> > > +		pr_warn("z3fold: could not register shrinker\n");
> > > +		pool->no_shrinker = true;
> > > +	} 
> > 
> > Just fail creation of the pool. If you can't register a shrinker,
> > then much bigger problems are about to happen to your system, and
> > running a new memory consumer that /can't be shrunk/ is not going to
> > help anyone.
> 
> I don't have a strong opinion on this but it doesn't look fatal to me
> in _this_ particular case (z3fold) since even without the shrinker, the
> compression ratio will never be lower than the one of zbud, which
> doesn't have a shrinker at all.

Either your subsystem needs a shrinker or it doesn't. If it needs
one, then it should follow the accepted norm of failing
initialisation because a shrinker register failure is indicative of
a serious memory allocation problem already occurring in the
machine. We can only make it worse by continuing without a
shrinker....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>