Hi Xin, On Fri, Apr 29, 2022 at 07:38:48AM +0800, Xin Yin wrote: > Hi Jeffle & Xiang > > I have tested your fscache,erofs: fscache-based on-demand read semantics > v9 patches sets https://www.spinics.net/lists/linux-fsdevel/msg216178.html. > For now , it works fine with the nydus image-service. After the image data > is fully loaded to local storage, it does have great IO performance gain > compared with nydus V5 which is based on fuse. Yeah, thanks for your interest and efforts. Actually I'm pretty sure you could observe CPU, bandwidth and latency improvement on the dense deployed scenarios since our goal is to provide native performance when the data is ready, as well as image on-demand read, flexible cache data management to end users. > > For 4K random read , fscache-based erofs can get the same performance with > the original local filesystem. But I still saw a performance drop in the 4K > sequential read case. And I found the root cause is in erofs_fscache_readahead() > we use synchronous IO , which may stall the readahead pipelining. > Yeah, that is a known TODO, in principle, when such part of data is locally available, it will have the similar performance (bandwidth, latency, CPU loading) as loop device. But we don't implement asynchronous I/O for now, since we need to make the functionality work first, so thanks for your patch addressing this. > I have tried to change to use asynchronous io during erofs fscache readahead > procedure, as what netfs did. Then I saw a great performance gain. > > Here are my test steps and results: > - generate nydus v6 format image , in which stored a large file for IO test. > - launch nydus image-service , and make image data fully loaded to local storage (ext4). > - run fio with below cmd. > fio -ioengine=psync -bs=4k -size=5G -direct=0 -thread -rw=read -filename=./test_image -name="test" -numjobs=1 -iodepth=16 -runtime=60 Yeah, although I can see what you mean (to test buffered I/O), the argument is still somewhat messy (maybe because we don't support fscache-based direct I/O for now. That is another TODO but with low priority.) > > v9 patches: 202654 KB/s > v9 patches + async readahead patch: 407213 KB/s > ext4: 439912 KB/s May I ask if such ext4 image is through a loop device? If not, that is reasonable. Anyway, it's not a big problem for now, we could optimize it later since it should be exactly the same finally. And I will drop a message to Jeffle for further review since we're closing to another 5-day national holiday. Thanks again! Gao Xiang