On 7/21/21 3:18 AM, Vivek Goyal wrote: > On Tue, Jul 20, 2021 at 01:25:11PM +0800, JeffleXu wrote: >> >> >> On 7/20/21 5:30 AM, Vivek Goyal wrote: >>> On Fri, Jul 16, 2021 at 06:47:49PM +0800, Jeffle Xu wrote: >>>> This patchset adds support of per-file DAX for virtiofs, which is >>>> inspired by Ira Weiny's work on ext4[1] and xfs[2]. >>>> >>>> There are three related scenarios: >>>> 1. Alloc inode: get per-file DAX flag from fuse_attr.flags. (patch 3) >>>> 2. Per-file DAX flag changes when the file has been opened. (patch 3) >>>> In this case, the dentry and inode are all marked as DONT_CACHE, and >>>> the DAX state won't be updated until the file is closed and reopened >>>> later. >>>> 3. Users can change the per-file DAX flag inside the guest by chattr(1). >>>> (patch 4) >>>> 4. Create new files under directories with DAX enabled. When creating >>>> new files in ext4/xfs on host, the new created files will inherit the >>>> per-file DAX flag from the directory, and thus the new created files in >>>> virtiofs will also inherit the per-file DAX flag if the fuse server >>>> derives fuse_attr.flags from the underlying ext4/xfs inode's per-file >>>> DAX flag. >>> >>> Thinking little bit more about this from requirement perspective. I think >>> we are trying to address two use cases here. >>> >>> A. Client does not know which files DAX should be used on. Only server >>> knows it and server passes this information to client. I suspect >>> that's your primary use case. >> >> Yes, this is the starting point of this patch set. >> >>> >>> B. Client is driving which files are supposed to be using DAX. This is >>> exactly same as the model ext4/xfs are using by storing a persistent >>> flag on inode. >>> >>> Current patches seem to be a hybrid of both approach A and B. >>> >>> If we were to implement B, then fuse client probably needs to have the >>> capability to query FS_XFLAG_DAX on inode and decide whether to >>> enable DAX or not. (Without extra round trip). Or know it indirectly >>> by extending GETATTR and requesting this explicitly. >> >> If guest queries if the file is DAX capable or not by an extra GETATTR, >> I'm afraid this will add extra round trip. >> >> Or if we add the DAX flag (ATTR_DAX) by extending LOOKUP, as done by >> this patch set, then the FUSE server needs to set ATTR_DAX according to >> the FS_XFLAG_DAX of the backend files, *to make the FS_XFLAG_DAX flag >> previously set by FUSE client work*. Then it becomes a *mandatory* >> requirement when implementing FUSE server. i.e., it becomes part of the >> FUSE protocol rather than implementation specific. FUSE server can still >> implement some other algorithm deciding whether to set ATTR_DAX or not, >> though it must set ATTR_DAX once the backend file is flagged with >> FS_XFLAG_DAX. >> >> Besides, as you said, FUSE server needs to add one extra >> GETFLAGS/FSGETXATTR ioctl per LOOKUP in this case. To eliminate this >> cost, we could negotiate the per-file DAX capability during FUSE_INIT. >> Only when the per-file DAX capability is negotiated, will the FUSE >> server do extra GETFLAGS/FSGETXATTR ioctl and set ATTR_DAX flag when >> doing LOOKUP. >> >> >> Personally, I prefer the second way, i.e., by extending LOOKUP (adding >> ATTR_DAX), to eliminate the extra round trip. > > Negotiating a fuse feature say FUSE_FS_XFLAG_DAX makes sense. If > client is mounted with "-o dax=inode", then client will negotitate > this feature and if server does not support it, mount can fail. > > But this probably will be binding on server that it needs to return > the state of FS_XFLAG_DAX attr on inode upon lookup/getattr. I don't > this will allow server to implement its own separate policy which > does not follow FS_XFLAG_DAX xattr. That means the backend fs must be ext4/xfs supporting per-file DAX feature. If given more right to construct its own policy, FUSE server could support per-file DAX upon other backend fs, though it will always fail when virtiofs wants to set FS_XFLAG_DAX inside guest. > > IOW, I don't think server can choose to implement its own policy > for enabling dax for "-o dax=inode". > > If there is really a need to for something new where server needs > to dynamically decide which inodes should use dax (and not use > FS_XFLAG_FS), I guess that probably should be a separate mount > option say "-o dax=server" and it negotiates a separate feature > say FUSE_DAX_SERVER. Once that's negotiated, now both client > and server know that DAX will be used on files as determined > by server and not necessarily by using file attr FS_XFLAG_DAX. > > So is "dax=inode" enough for your needs? What's your requirement, > can you give little bit of more details. In our use case, the backend fs is something like SquashFS on host. The content of the file on host is downloaded *as needed*. When the file is not completely ready (completely downloaded), the guest will follow the normal IO routine, i.e., by FUSE_READ/FUSE_WRITE request. While the file is completely ready, per-file DAX is enabled for this file. IOW the FUSE server need to dynamically decide if per-file DAX shall be enabled, depending on if the file is completely downloaded. Our strategy and design is still under discussion and may change. Any comment and discussion is welcomed. > >> >>> >>> If we were only implementing A, then server does not have a way to >>> tell client to enable DAX. Server can either look at FS_XFLAG_DAX >>> and decide to enable DAX or use some other property. Given querying >>> FS_XFLAG_DAX will be an extra ioctl() on every inode lookup/getattr, >>> it probably will be a server option. But enabling on server does not >>> mean client will enable it. >> >> As I said previously, it can be negotiated whether this per-file DAX >> capability is needed. Guest can advertise this capability when '-o >> dax=inode' is configured. >> >>> >>> I think my primary concern with this patch right now is trying to >>> figure out which requirement we are trying to cater to first and how >>> to connect server and client well so they both understand what mode >>> they are operating in and interact well. >>> >> >> >> -- >> Thanks, >> Jeffle >> -- Thanks, Jeffle _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization