Re: Persistent FUSE file handles (Was: virtiofs uuid and file handles)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 12, 2022 at 10:56 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote:
> > On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > >
> > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote:
> > > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote:
> > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > > normally for all other operations.
> > > > > > >
> > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > > based cache at any time.   If that happens and a request from the
> > > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > > lookup_handle(fh, NULL).
> > > > > > >
> > > > > > > Disadvantages being:
> > > > > > >
> > > > > > >  - cost of generating a file handle on all lookups
> > > > > > >  - cost of storing file handle in kernel icache
> > > > > > >
> > > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > > cache is much higher.
> > > > > > >
> > > > > >
> > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic
> > > > > > implementation of persistent file handles for FUSE.
> > > > >
> > > > > Hi Amir,
> > > > >
> > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was
> > > > > wondering how nodeid reuse is handled.
> > > >
> > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id.
> > >
> > > Ok. So this variable size id is basically file handle returned by
> > > host?
> > >
> > > So this looks little different from what Miklos had suggested. IIUC,
> > > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*.
> > >
> > > *********************************
> > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > LOOKUP except it takes a {variable length handle, name} as input and
> > > returns a variable length handle *and* a u64 node_id that can be used
> > > normally for all other operations.
> > > ***************************************
> > >
> >
> > Ha! Thanks for reminding me about that.
> > It's been a while since I looked at what actually needs to be done.
> > That means that evicting server inodes from cache may not be as
> > easy as I had imagined.
> >
> > > > A server that declares support for LOOKUP_HANDLE must never
> > > > reuse a handle.
> > > >
> > > > That's the basic idea. Just as a filesystem that declares to support
> > > > exportfs must never reuse a file handle.
> > >
> > > >
> > > > > IOW, if server decides to drop
> > > > > nodeid from its cache and reuse it for some other file, how will we
> > > > > differentiate between two. Some sort of generation id encoded in
> > > > > nodeid?
> > > > >
> > > >
> > > > That's usually the way that file handles are implemented in
> > > > local fs. The inode number is the internal lookup index and the
> > > > generation part is advanced on reuse.
> > > >
> > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will
> > > > just use the native fs file handles, so virtiofsd can evict the inodes
> > > > entry from its cache completely, not only close the open fds.
> > >
> > > Ok, got it. Will be interesting to see how kernel fuse changes look
> > > to accomodate this variable sized nodeid.
> > >
> >
> > It may make sense to have a FUSE protocol dialect where nodeid
> > is variable size for all commands, but it probably won't be part of
> > the initial LOOKUP_HANDLE work.
> >
> > > >
> > > > That is what my libfuse_passthough POC does.
> > >
> > > Where have you hosted corresponding kernel changes?
> > >
> >
> > There are no kernel changes.
> >
> > For xfs and ext4 I know how to implement open_by_ino()
> > and I know how to parse the opaque fs file handle to extract
> > ino+generation from it and return them in FUSE_LOOKUP
> > response.
>
> Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but
> not necessarily on other filesystems like nfs. (Because they have their
> own way of encoding things in file handle).

Correct.
If it is not xfs/ext4, the passthrough implementation uses open fds.
In the non xfs/ext4 case, an NFS client may get ESTALE errors after
server restart and worse - get the content of the wrong object
if server has reused nodeid+genration after restart.

>
> >
> > So I could manage to implement persistent NFS file handles
> > over the existing FUSE protocol with 64bit node id.
>
> And that explains that why you did not have to make kernel changes. But
> this does not allow sever to close the fd associate with nodeid? Or there
> is a way for server to generate file handle and then call
> open_by_handle_at().
>

Yes, that's what the server does in case of ext4/xfs:
https://github.com/amir73il/libfuse/blob/fuse_passthrough/passthrough/fuse_passthrough.cpp#L912

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux