On 8/17/21 10:08 PM, Miklos Szeredi wrote: > On Tue, 17 Aug 2021 at 15:22, JeffleXu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: >> >> >> >> On 8/17/21 8:39 PM, Vivek Goyal wrote: >>> On Tue, Aug 17, 2021 at 10:06:53AM +0200, Miklos Szeredi wrote: >>>> On Tue, 17 Aug 2021 at 04:22, Jeffle Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>> This patchset adds support of per-file DAX for virtiofs, which is >>>>> inspired by Ira Weiny's work on ext4[1] and xfs[2]. >>>> >>>> Can you please explain the background of this change in detail? >>>> >>>> Why would an admin want to enable DAX for a particular virtiofs file >>>> and not for others? >>> >>> Initially I thought that they needed it because they are downloading >>> files on the fly from server. So they don't want to enable dax on the file >>> till file is completely downloaded. >> >> Right, it's our initial requirement. >> >> >>> But later I realized that they should >>> be able to block in FUSE_SETUPMAPPING call and make sure associated >>> file section has been downloaded before returning and solve the problem. >>> So that can't be the primary reason. >> >> Saying we want to access 4KB of one file inside guest, if it goes >> through FUSE request routine, then the fuse daemon only need to download >> this 4KB from remote server. But if it goes through DAX, then the fuse >> daemon need to download the whole DAX window (e.g., 2MB) from remote >> server, so called amplification. Maybe we could decrease the DAX window >> size, but it's a trade off. > > That could be achieved with a plain fuse filesystem on the host (which > will get 4k READ requests for accesses to mapped area inside guest). > Since this can be done selectively for files which are not yet > downloaded, the extra layer wouldn't be a performance problem. > > Is there a reason why that wouldn't work? I didn't realize this mechanism (working around from user space) before sending this patch set. After learning the virtualization and KVM stuffs, I find that, as Vivek Goyal replied in [1], virtiofsd/qemu need to somehow hook the user page fault and then download the remained part. IMHO, this mechanism (as you proposed by implementing a plain fuse filesystem on the host) seems a little bit sophisticated so far. [1] https://lore.kernel.org/linux-fsdevel/YR08KnP8cO8LjKY7@xxxxxxxxxx/ -- Thanks, Jeffle