On Sat, 29 May 2021 at 18:05, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > On Wed, Sep 23, 2020 at 2:12 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > On Wed, Sep 23, 2020 at 11:57 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > On Wed, Sep 23, 2020 at 4:49 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > > > > > > > > I think that the proper was to implement reliable persistent file > > > > > handles in fuse/virtiofs would be to add ENCODE/DECODE to > > > > > FUSE protocol and allow the server to handle this. > > > > > > > > Max Reitz (Cc-d) is currently looking into this. > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > normally for all other operations. > > > > > > Miklos, Max, > > Any updates on LOOKUP_HANDLE work? > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > based fs) would be that userspace need not keep a refcounted object > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > based cache at any time. If that happens and a request from the > > > > client (kernel) comes in with a stale node ID, the server will return > > > > -ESTALE and the client can ask for a new node ID with a special > > > > lookup_handle(fh, NULL). > > > > > > > > Disadvantages being: > > > > > > > > - cost of generating a file handle on all lookups > > > > > > I never ran into a local fs implementation where this was expensive. > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > The cost of having to keep fds open while the client has them in its > > > > cache is much higher. > > > > > > > > > > Sounds good. > > > I suppose flock() does need to keep the open fd on server. > > > > Open files are a separate issue and do need an active object in the server. > > > > The issue this solves is synchronizing "released" and "evicted" > > states of objects between server and client. I.e. when a file is > > closed (and no more open files exist referencing the same object) the > > dentry refcount goes to zero but it remains in the cache. In this > > state the server could really evict it's own cached object, but can't > > because the client can gain an active reference at any time via > > cached path lookup. > > > > One other solution would be for the server to send a notification > > (NOTIFY_EVICT) that would try to clean out the object from the server > > cache and respond with a FORGET if successful. But I sort of like > > the file handle one better, since it solves multiple problems. > > > > Even with LOOKUP_HANDLE, I am struggling to understand how we > intend to invalidate all fuse dentries referring to ino X in case the server > replies with reused ino X with a different generation that the one stored > in fuse inode cache. > > This is an issue that I encountered when running the passthrough_hp test, > on my filesystem. In tst_readdir_big() for example, underlying files are being > unlinked and new files created reusing the old inode numbers. > > This creates a situation where server gets a lookup request > for file B that uses the reused inode number X, while old file A is > still in fuse dentry cache using the older generation of real inode > number X which is still in fuse inode cache. > > Now the server knows that the real inode has been rused, because > the server caches the old generation value, but it cannot reply to > the lookup request before the old fuse inode has been invalidated. > IIUC, fuse_lowlevel_notify_inval_inode() is not enough(?). > We would also need to change fuse_dentry_revalidate() to > detect the case of reused/invalidated inode. > > The straightforward way I can think of is to store inode generation > in fuse_dentry. It won't even grow the size of the struct. > > Am I over complicating this? In this scheme the generation number is already embedded in the file handle. If LOOKUP_HANDLE returns a nodeid that can be found in the icache, but which doesn't match the new file handle, then the old inode will be marked bad and a new one allocated. Does that answer your worries? Or am I missing something? Thanks, Miklos