Re: virtiofs uuid and file handles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 1, 2021 at 5:49 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
>
> On Mon, May 31, 2021 at 09:12:59PM +0300, Amir Goldstein wrote:
> > On Mon, May 31, 2021 at 5:11 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > >
> > > On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > >
> > > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > > I think that the proper was to implement reliable persistent file
> > > > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to
> > > > > > > > FUSE protocol and allow the server to handle this.
> > > > > > >
> > > > > > > Max Reitz (Cc-d) is currently looking into this.
> > > > > > >
> > > > > > > One proposal was to add  LOOKUP_HANDLE operation that is similar to
> > > > > > > LOOKUP except it takes a {variable length handle, name} as input and
> > > > > > > returns a variable length handle *and* a u64 node_id that can be used
> > > > > > > normally for all other operations.
> > > > > > >
> > > >
> > > > Miklos, Max,
> > > >
> > > > Any updates on LOOKUP_HANDLE work?
> > > >
> > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse
> > > > > > > based fs) would be that userspace need not keep a refcounted object
> > > > > > > around until the kernel sends a FORGET, but can prune its node ID
> > > > > > > based cache at any time.   If that happens and a request from the
> > > > > > > client (kernel) comes in with a stale node ID, the server will return
> > > > > > > -ESTALE and the client can ask for a new node ID with a special
> > > > > > > lookup_handle(fh, NULL).
> > > > > > >
> > > > > > > Disadvantages being:
> > > > > > >
> > > > > > >  - cost of generating a file handle on all lookups
> > > > > >
> > > > > > I never ran into a local fs implementation where this was expensive.
> > > > > >
> > > > > > >  - cost of storing file handle in kernel icache
> > > > > > >
> > > > > > > I don't think either of those are problematic in the virtiofs case.
> > > > > > > The cost of having to keep fds open while the client has them in its
> > > > > > > cache is much higher.
> > > > > > >
> > > > > >
> > > > > > Sounds good.
> > > > > > I suppose flock() does need to keep the open fd on server.
> > > > >
> > > > > Open files are a separate issue and do need an active object in the server.
> > > > >
> > > > > The issue this solves  is synchronizing "released" and "evicted"
> > > > > states of objects between  server and client.  I.e. when a file is
> > > > > closed (and no more open files exist referencing the same object) the
> > > > > dentry refcount goes to zero but it remains in the cache.   In this
> > > > > state the server could really evict it's own cached object, but can't
> > > > > because the client can gain an active reference at any time  via
> > > > > cached path lookup.
> > > > >
> > > > > One other solution would be for the server to send a notification
> > > > > (NOTIFY_EVICT) that would try to clean out the object from the server
> > > > > cache and respond with a FORGET if successful.   But I sort of like
> > > > > the file handle one better, since it solves multiple problems.
> > > > >
> > > >
> > > > Even with LOOKUP_HANDLE, I am struggling to understand how we
> > > > intend to invalidate all fuse dentries referring to ino X in case the server
> > > > replies with reused ino X with a different generation that the one stored
> > > > in fuse inode cache.
> > > >
> > > > This is an issue that I encountered when running the passthrough_hp test,
> > > > on my filesystem. In tst_readdir_big() for example, underlying files are being
> > > > unlinked and new files created reusing the old inode numbers.
> > > >
> > > > This creates a situation where server gets a lookup request
> > > > for file B that uses the reused inode number X, while old file A is
> > > > still in fuse dentry cache using the older generation of real inode
> > > > number X which is still in fuse inode cache.
> > > >
> > > > Now the server knows that the real inode has been rused, because
> > > > the server caches the old generation value, but it cannot reply to
> > > > the lookup request before the old fuse inode has been invalidated.
> > > > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?).
> > > > We would also need to change fuse_dentry_revalidate() to
> > > > detect the case of reused/invalidated inode.
> > > >
> > > > The straightforward way I can think of is to store inode generation
> > > > in fuse_dentry. It won't even grow the size of the struct.
> > > >
> > > > Am I over complicating this?
> > >
> > > In this scheme the generation number is already embedded in the file
> > > handle.  If LOOKUP_HANDLE returns a nodeid that can be found in the
> > > icache, but which doesn't match the new file handle, then the old
> > > inode will be marked bad and a new one allocated.
> > >
> > > Does that answer your worries?  Or am I missing something?
> >
> > It affirms my understanding of the future implementation, but
> > does not help my implementation without protocol changes.
> > I thought I could get away without LOOKUP_HANDLE for
> > underlying fs that is able to resolve by ino, but seems that I still have an
> > unhandled corner case, so will need to add some kernel patch.
> > Unless there is already a way to signal from server to make the
> > inode bad in a synchronous manner (I did not find any) before
> > replying to LOOKUP with a new generation of the same ino.
> >
> > Any idea about the timeline for LOOKUP_HANDLE?
> > I may be able to pick this up myself if there is no one actively
> > working on it or plans for anyone to make this happen.
>
> AFAIK, right now max is not actively looking into LOOKUP_HANDLE.
>
> To solve the issue of virtiofs server having too many fds open, he
> is now planning to store corresonding file handle in server and use
> that to open fd later.
>
> But this does not help with persistent file handle issue for fuse
> client.
>

I see. Yes that should work, but he'd still need to cope with reused
inode numbers in case you allow unlinks from the host (do you?),
because LOOKUP can find a host fs inode that does not match
the file handle of a previously found inode of the same ino.

Quoting Miklos' response above:
> > > If LOOKUP_HANDLE returns a nodeid that can be found in the
> > > icache, but which doesn't match the new file handle, then the old
> > > inode will be marked bad and a new one allocated.

This statement, with minor adjustments is also true for LOOKUP:

"If LOOKUP returns a nodeid that can be found in the icache, but
 whose i_generation doesn't match the generation returned in outarg,
 then the old inode should be marked bad and a new one allocated."

> BTW, one concern with file handles coming from guest kernel was that
> how to trust those handles. Guest can create anything and use
> file server to open the files on same filesystem (but not shared
> with guest).
>
> I am assuming same concern should be there with non-virtiofs use
> cases. Regular fuse client must be sending a file handle and
> file server is running with CAP_DAC_READ_SEARCH. How will it make
> sure that client is not able to access files not exported through
> shared directory but are present on same filesystem.
>

That is a concern.
It's the same concern for NFS clients that can guess file handles.

The ways to address this concern with NFS is the export option
subtree_check, but that uses non unique file handles to an inode
which include a parent handle, so that's probably not a good fit for
LOOKUP_HANDLE.

If there are no plans to go forward with LOOKUP_HANDLE, I may
respin my protosal to add  ENCODE/DECODE to FUSE protocol
in place of the lookup(".") command.

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux