On Fri, Apr 29, 2022 at 8:36 PM Gao Xiang <hsiangkao@xxxxxxxxxxxxxxxxx> wrote: > > Hi Xin, > > On Fri, Apr 29, 2022 at 07:38:48AM +0800, Xin Yin wrote: > > Hi Jeffle & Xiang > > > > I have tested your fscache,erofs: fscache-based on-demand read semantics > > v9 patches sets https://www.spinics.net/lists/linux-fsdevel/msg216178.html. > > For now , it works fine with the nydus image-service. After the image data > > is fully loaded to local storage, it does have great IO performance gain > > compared with nydus V5 which is based on fuse. > > Yeah, thanks for your interest and efforts. Actually I'm pretty sure you > could observe CPU, bandwidth and latency improvement on the dense deployed > scenarios since our goal is to provide native performance when the data is > ready, as well as image on-demand read, flexible cache data management to > end users. > > > > > For 4K random read , fscache-based erofs can get the same performance with > > the original local filesystem. But I still saw a performance drop in the 4K > > sequential read case. And I found the root cause is in erofs_fscache_readahead() > > we use synchronous IO , which may stall the readahead pipelining. > > > > Yeah, that is a known TODO, in principle, when such part of data is locally > available, it will have the similar performance (bandwidth, latency, CPU > loading) as loop device. But we don't implement asynchronous I/O for now, > since we need to make the functionality work first, so thanks for your > patch addressing this. > > > I have tried to change to use asynchronous io during erofs fscache readahead > > procedure, as what netfs did. Then I saw a great performance gain. > > > > Here are my test steps and results: > > - generate nydus v6 format image , in which stored a large file for IO test. > > - launch nydus image-service , and make image data fully loaded to local storage (ext4). > > - run fio with below cmd. > > fio -ioengine=psync -bs=4k -size=5G -direct=0 -thread -rw=read -filename=./test_image -name="test" -numjobs=1 -iodepth=16 -runtime=60 > > Yeah, although I can see what you mean (to test buffered I/O), the > argument is still somewhat messy (maybe because we don't support > fscache-based direct I/O for now. That is another TODO but with > low priority.) > > > > > v9 patches: 202654 KB/s > > v9 patches + async readahead patch: 407213 KB/s > > ext4: 439912 KB/s > > May I ask if such ext4 image is through a loop device? If not, that is > reasonable. Anyway, it's not a big problem for now, we could optimize > it later since it should be exactly the same finally. > This ext4 image is not through a loop device , just the same test file stored in native ext4. Actually , after further tests , I could see that fscache-based erofs with async readahead patch almost achieve native performance in sequential buffer read cases. Thanks, Xin Yin > And I will drop a message to Jeffle for further review since we're > closing to another 5-day national holiday. > > Thanks again! > Gao Xiang >