Re: [PATCHSET v02 00/16] zuf: ZUFS Zero-copy User-mode FileSystem

Miklos Szeredi <miklos@xxxxxxxxxx> · Thu, 26 Sep 2019 09:11:25 +0200

On Thu, Sep 26, 2019 at 4:08 AM Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote:

> Performance:
> A simple fio direct 4k random write test with incrementing number
> of threads.
>
> [fuse]
> threads wr_iops wr_bw   wr_lat
> 1       33606   134424  26.53226
> 2       57056   228224  30.38476
> 4       88667   354668  40.12783
> 7       116561  466245  53.98572
> 8       129134  516539  55.6134
>
> [fuse-splice]
> threads wr_iops wr_bw   wr_lat
> 1       39670   158682  21.8399
> 2       51100   204400  34.63294
> 4       75220   300882  47.42344
> 7       97706   390825  63.04435
> 8       98034   392137  73.24263
>
> [xfs-dax]
> threads wr_iops wr_bw           wr_lat

Data missing.

> [Maxdata-1.5-zufs]
> threads wr_iops wr_bw           wr_lat
> 1       1041802 260,450         3.623
> 2       1983997 495,999         3.808
> 4       3829456 957,364         3.959
> 7       4501154 1,125,288       5.895330
> 8       4400698 1,100,174       6.922174

Just a heads up, that I have achieved similar results with a prototype
using the unmodified fuse protocol.  This prototype was built with
ideas taken from zufs (percpu/lockless, mmaped dev, single syscall per
op).  I found a big scheduler scalability bottleneck that is caused by
update of mm->cpu_bitmap at context switch.   This can be worked
around by using shared memory instead of shared page tables, which is
a bit of a pain, but it does prove the point.  Thought about fixing
the cpu_bitmap cacheline pingpong, but didn't really get anywhere.

Are you interested in comparing zufs with the scalable fuse prototype?
 If so, I'll push the code into a public repo with some instructions,

Thanks,
Miklos