Re: [RFC][PATCH v3 1/5] mm/zsmalloc: introduce class auto-compaction

Sergey Senozhatsky <sergey.senozhatsky.work@xxxxxxxxx> · Tue, 15 Mar 2016 10:33:03 +0900

On (03/15/16 09:46), Minchan Kim wrote:
[..]
> > yes,
> > 
> > we do less work this way - scan and compact only one class, instead
> > of locking and compacting all of them; which sounds reasonable.
> 
> Hmm,, It consumes more memory(i.e., sizeof(work_struct) + sizeof(void *)
> + sizeof(bool) * NR_CLASS) as well as kicking many work up to NR_CLASS.

yes, it does. not really happy with it either.

> I didn't test your patch but I guess I can make worst case scenario.
> 
> * make every class fragmented under 40%
> * On the 40% boundary, repeated alloc/free of every class so every free
>   can schedule work if it was not scheduled.
> * Although class fragment is too high, it's not a problem if the class
>   consumes small amount of memory.

hm, in this scenario both solutions are less than perfect. we jump
X times over 40% margin, we have X*NR_CLASS compaction scans in the
end. the difference is that we queue less works, yes, but we don't
have to use workqueue in the first place; compaction can be done
asynchronously by a pool's dedicated kthread. so we will just
wake_up() the process.

> I guess it can make degradation if I try to test on zsmalloc
> microbenchmark.
> 
> As well, although I don't know workqueue internal well, thesedays,
> I saw a few of mails related to workqueue(maybe, vmstat) and it had
> some trouble if system memory pressure is heavy IIRC.

yes, you are right. wq provides WQ_MEM_RECLAIM bit for this
case -- a special kthread that it will wake up to process works.

> My approach is as follows, for exmaple.
>
> Let's make a global ratio. Let's say it's 4M.

ok. should it depend on pool size?  min(20% of pool_size, XXMB)?

> If zs_free(or something) realizes current fragment is over 4M,
> kick compacion backgroud job.

yes, zs_free() is the only place that introduces fragmentation.

> The job scans from highest to lower class and compact zspages
> in each size_class until it meets high watermark(e.g, 4M + 4M /2 =
> 6M fragment ratio).

ok.

> And in the middle of background compaction, if we find it's too
> many scan(e.g., 256 zspages or somethings), just bail out the
> job for the latency and reschedule it for next time. At the next
> time, we can continue from the last size class.

ok. I'd probably prefer more simple rules here:
-- bail out because it has compacted XXMB
   so the fragmentation ratio is *expected* to be below the watermark
-- nothing to scan anymore
   compaction is executed concurrently with zs_free()/zs_malloc()
   calls, it's harder to control/guarantee some global state.

overall, no real objections. this approach can work, I think. need
to test it.

> I know your concern is unncessary scan but I'm not sure it can
> affect performance although we try to evaluate performance with
> microbenchmark. It just loops and check with zs_can_compact
> for 255 size class.

	-ss

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>