Re: [PATCH] z3fold: use per-cpu unbuddied lists

Vitaly Wool <vitalywool@xxxxxxxxx> · Thu, 3 Aug 2017 01:49:38 +0200

On Aug 3, 2017 01:07, "Andrew Morton" <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
On Wed, 2 Aug 2017 12:25:05 +0200 Vitaly Wool <vitalywool@xxxxxxxxx> wrote:

> z3fold is operating on unbuddied lists in a simple manner: in fact,

> it only takes the first entry off the list on a hot path. So if the

> z3fold pool is big enough and balanced well enough, considering

> only the lists local to the current CPU won't be an issue in any

> way, while random I/O performance will go up.

Has the performance benefit been measured?  It's a large patch.

Yes, mostly by running fio in randrw mode. We can see the performance more than doubling on a 8-core ARM64 system. 

> This patch also introduces two worker threads which: one for async

> in-page object layout optimization and one for releasing freed

> pages.

Why?  What are the runtime effects of this change?  Does this turn

currently-synchronous operations into now-async operations?  If so,

what are the implications of this if, say, the workqueue doesn't get

serviced for a while?

The biggest benefit is that it usually ends up with one call to compact_page instead of two. Also, we use z3fold as a zram backend and zram likes to free pages on a critical path so removing compaction from this critical path is definitely a nice thing. 

If compaction workqueue doesn't get serviced for a significant while, the ratio will go down a bit, no bad things will happen. And z3fold_alloc tries to take new pages from the stale list first, so even if release workqueue is not called, the pages will be reused by z3fold_alloc. 

etc.  Sorry, but I'm not seeing anywhere near enough information and

testing results to justify merging such a large and intrusive patch.

I understand. Would it help if I add fio results and some explanations from this reply to the commit message?. 

Thanks, 
  Vitaly