Re: [PATCH v5 2/5] fuse: make DAX mount option a tri-state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 23, 2021 at 09:02:26PM -0400, Vivek Goyal wrote:
> On Fri, Sep 24, 2021 at 08:26:18AM +1000, Dave Chinner wrote:
> > On Thu, Sep 23, 2021 at 03:02:41PM -0400, Vivek Goyal wrote:
> > > On Thu, Sep 23, 2021 at 05:25:23PM +0800, Jeffle Xu wrote:
> > > > We add 'always', 'never', and 'inode' (default). '-o dax' continues to
> > > > operate the same which is equivalent to 'always'. To be consistemt with
> > > > ext4/xfs's tri-state mount option, when neither '-o dax' nor '-o dax='
> > > > option is specified, the default behaviour is equal to 'inode'.
> > > 
> > > So will "-o dax=inode" be used for per file DAX where dax mode comes
> > > from server?
> > > 
> > > I think we discussed this. It will be better to leave "-o dax=inode"
> > > alone. It should be used when we are reading dax status from file
> > > attr (like ext4 and xfs). 
> > > 
> > > And probably create separate option say "-o dax=server" where server
> > > specifies which inode should use dax.
> > 
> > That seems like a poor idea to me.
> > 
> > The server side already controls what the client side does by
> > controlling the inode attributes that the client side sees.  That
> > is, if the server is going to specify whether the client side data
> > access is going to use dax, then the server presents the client with
> > an inode that has the DAX attribute flag set on it.
> 
> Hi Dave,
> 
> Currently in fuse/virtiofs, DAX is compltely controlled by client. Server
> has no say in it. If client is mounted with "-o dax", dax is enabled on
> all files otherwise dax is disabled on all files. One could think of
> implementing an option on server so that server could deny mmap()
> requests that come from client, but so far nobody asked for such
> an option on server side.

Can you please refer to the new options "always|never|inode" in
disucssions, because the "-o dax" option is deprecated and we really
need to center discussions around the new options, not the
deprecated one.

> When you say "inode that has DAX attribute flag set on it", are you
> referring to "S_DAX (in inode->i_flags)" or persistent attr
> "FS_XFLAG_DAX"?

Neither. The S_DAX flags is the VFS struct inode state that tells
the kernel what to do with that inode. The FS_XFLAG_DAX is the
userspace API status/control flag that represents the current state
of the persistent per-inode DAX control attribute.

What I'm talking about the persistent attribute that is present
on stable storage. e.g. in the XFS on-disk inode the flag is
XFS_DIFLAG2_DAX which is never seen outside low level XFS code. In
the case of FUSE, this is the persistent server-side DAX attribute
state. The client gets this information from the server via an
inode attribute defined in the FUSE protocol. The client translates
that network protocol attribute to S_DAX status on FS inode
instantiation, and to/from FS_XFLAG_DAX for user control under
dax=inode.

In the case that the user changes FS_XFLAG_DAX, the FUSE client
needs to communicate that attribute change to the server, where the
server then changes the persistent state of the on-disk inode so
that the next time the client requests that inode, it gets the state
it previously set. Unless, of course, there are server side policy
overrides (never/always).

IOWs, dax=inode on the client side is essentially "follow server
side policy" because the server maintains the persistent DAX flag
state in the server side inode...

> As of now S_DAX on client side inode is set by fuse client whenever
> client mounted filesystem with "-o dax". And I think you are assuming

Of course. Similarly, if the client uses dax=never, the S_DAX is
never set on the VFS inode. But if dax=inode is set, where does the
DAX attribute state that determines S_DAX comes from .....?

> that DAX attribute of inode is already coming from server. That's not
> the case. In fact that seems to be the proposal. Provide capability
> so that server can specify which inode should be using DAX and which
> inode should not be.

Yup, that's exactly what I'm talking about - this is "dax=inode"
behaviour, because the client follows what the server tells it
in the inode attributes that arrive over the wire.

> > In that case, turning off dax on the guest side should be
> > communicated to the fuse server so the server turns off the DAX flag
> > on the server side iff server side policy allows it.
> 
> Not sure what do you mean by server turns of DAX flag based on client
> turning off DAX. Server does not have to do anything. If client decides
> to not use DAX (in guest), it will not send FUSE_SETUPMAPPING requests
> to server and that's it.

Where does the client get it's per-inode DAX policy from if
dax=inode is, like other DAX filesystems, the default behaviour?

Where is the persistent storage of that per-inode attribute kept?

> > When the guest
> > then purges it's local inode and reloads it from the server then it
> > will get an inode with the DAX flag set according to server side
> > policy.
> 
> So right now we don't have a mechanism for server to specify DAX flag.
> And that's what this patch series is trying to do.
> 
> > 
> > Hence if the server side is configured with "dax=always" or
> > dax="never" semantics it means the client side inode flag state
> > cannot control DAX mode. That means, regardless of the client side
> > mount options, DAX is operating under always or never policy,
> 
> Hmm..., When you say "server side is configured with "dax=always", 
> do you mean shared directory on host is mounted with "-o dax=always",
> or you mean some virtiofs server specific option which enables
> dax on all inodes from server side.

I don't care. That's a server side implementation detail. You can
keep it in a private xattr for all I care.

> In general, DAX on host and DAX inside guest are completely independent.
> Host filesystem could very well be mounted with dax or without dax and
> that has no affect on guests's capability to be able to enable DAX or
> not. 

If you have both the host and guest accessing the same files with
different modes, then you have a data coherency problem that is
guaranteed to corrupt data.

> > seems to be exactly what 'dax=inode' behaviour means on the client
> > side - it behaves according to how the server side propagates the
> > DAX attribute to the client for each inode.
> 
> Ok. So "-o dax=inode" in fuse will have a different meaning as opposed
> to ext4/xfs. This will mean that server will pass DAX state of inode
> when inode is instantiated and client should honor that. 

No, that's exactly what dax=inode means: DAX behaviour is
per-inode state that users must probe via statx() to determine if
dax is active or not.

> But what about FS_XFLAG_DAX flag then. Say host file system
> does support this att and fuse/virtiofs allows querying and
> setting this attribute (I don't think it is allowed now).

FS_XFLAG_DAX is the ioctl-based control API for the client side.
It's not a persistent flag.

> So will we not create a future conflict where in case of
> fuse/virtiofs "-o dax=inode" means something different and it does
> look at FS_XFLAG_DAX file attr.

Like many, I suspect you might be mis-understanding the DAX API.

There are 4 parts to it:

- persistent filesystem inode state flag
- ioctls to manipulate persistent inode state flag (FS_XFLAG_DAX)
- mount options to override persistent inode state flag
- VFS inode state indicating DAX is active (S_DAX, STATX_ATTR_DAX)

You seem to be conflating the user API FS_XFLAG_DAX flag with
whatever the filesystem uses to physically storage that state. They
are not the same thing. FS_XFLAG_DAX also has no correlation with
the mount options - we can change the on-disk flag state regardless
of the mount option in use. We can even change the on-disk state
flag on *kernels that don't support DAX*.  Changing the on-disk
attribute *does not change S_DAX*.

That's because the on-disk persistent state is a property of the
filesystem's on-disk format, not a property of the kernel that is
running on that machine. The persistent state flag is managed as a
completely independent filesystem inode attribute, but it only
affects behaviour when the dax=inode mode has been selected.

Control of the behaviour by this persistent inode flag is what
dax=inode means. It does not define how that attribute flag is
managed, it just defines the policy that an the inode's behaviour
will be determined by it's dax attribute state.

OTOH, S_DAX is used by the kernel to enable DAX access. It is the
_mechanism_, not the policy. We control S_DAX by mount option and/or
the filesystems persistent inode state flag, and behaviour is
determined by these policies at VFS inode instantiation time. Change
the policy, turf the inode out of cache and re-instantiate it, and
S_DAX for that inode is recalculated from the new policy.

For FUSE, the server provides the persistent storage and nothing
else. The ioctls to manipulate dax state are client side ioctls, as
are the mount options and the S_DAX vfs inode state. Hence for
server side management of the per-inode S_DAX state, the FUSE
protocol needs to be able to pass the per-inode persistent DAX
attribute state to the client and the client needs to honor
that attribute from the server.

How the FUSE server stores this persistent attribute is up to the
server implementation. If the server wants FUSE access to be
independent of host access, it can't use the persistent inode flags
in the host filesystem - it will need to use it's own xattrs. If the
server wants host access to be coherent with the guest, then it will
need to implement that in a different way.

But the FUSE client doesn't care about any of this server side
policy mumbo-jumbo - it just needs to do what the server sends
to it in the DAX inode attribute. And that's exactly what dax=inode
means...

> In summary, there seem to be two use cases.
> 
> A. virtiofsd/fuse-server wants do be able to come up with its own
> policy to decide which inodes should use guest.
> 
> B. guest client decides which inode use DAX based on FS_XFLAG_DAX
> attr on inode (and server does not play a role).
> 
> To be able to different between these two cases, I was suggesting
> using "-o dax=inode" for B and "-o dax=server" for A.

dax=inode covrees both these cases.  All the server side needs to do
is set the inode attribute according to policy, and all the client
needs to do is obey the server side per-inode attribute. IOWs,
client side using "dax=inode" means the server side controls the DAX
behaviour via the FUSE DAX attribute. 

If the server wants the client to always use DAX, then it always
sets the FUSE inode attribute. If the server says "never use DAX",
then it never sets the FUSE inode attribute.  If the server doesn't
want the client to control policy, then it just rejects attempts to
change the per-inode persistent DAX flag via client side
ioctl(FS_XFLAG_DAX) calls. Hence we have use case A completely
covered.

For case B, which is true dax=inode behaviour at the client, then
the server side honours the client side requests to change the
persistent FUSE DAX inode attribute via client side FS_XFLAG_DAX
ioctls.

See? At the client side, there is absolutely no difference between
server side policy control and true dax=inode support. The client is
completely unaware of server side policies and that's how the client
side should be implemented. The applications have to be prepared for
policy to change dynamically with either dax=server or dax=inode, so
there's no difference to applications running in the guest either.

Hence I just don't see the justification for a special DAX mode from
an architectural POV. It's no more work to implement per-inode DAX
properly form the start and we don't end up with a special, subtly
different mode.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux