On Mon, Sep 12, 2022 at 10:56 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > On Mon, Sep 12, 2022 at 06:07:42PM +0300, Amir Goldstein wrote: > > On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > > > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > > > normally for all other operations. > > > > > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > > > based cache at any time. If that happens and a request from the > > > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > > > lookup_handle(fh, NULL). > > > > > > > > > > > > > > Disadvantages being: > > > > > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > > > The cost of having to keep fds open while the client has them in its > > > > > > > cache is much higher. > > > > > > > > > > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > > > > implementation of persistent file handles for FUSE. > > > > > > > > > > Hi Amir, > > > > > > > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > > > > wondering how nodeid reuse is handled. > > > > > > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. > > > > > > Ok. So this variable size id is basically file handle returned by > > > host? > > > > > > So this looks little different from what Miklos had suggested. IIUC, > > > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. > > > > > > ********************************* > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > LOOKUP except it takes a {variable length handle, name} as input and > > > returns a variable length handle *and* a u64 node_id that can be used > > > normally for all other operations. > > > *************************************** > > > > > > > Ha! Thanks for reminding me about that. > > It's been a while since I looked at what actually needs to be done. > > That means that evicting server inodes from cache may not be as > > easy as I had imagined. > > > > > > A server that declares support for LOOKUP_HANDLE must never > > > > reuse a handle. > > > > > > > > That's the basic idea. Just as a filesystem that declares to support > > > > exportfs must never reuse a file handle. > > > > > > > > > > > > IOW, if server decides to drop > > > > > nodeid from its cache and reuse it for some other file, how will we > > > > > differentiate between two. Some sort of generation id encoded in > > > > > nodeid? > > > > > > > > > > > > > That's usually the way that file handles are implemented in > > > > local fs. The inode number is the internal lookup index and the > > > > generation part is advanced on reuse. > > > > > > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > > > > just use the native fs file handles, so virtiofsd can evict the inodes > > > > entry from its cache completely, not only close the open fds. > > > > > > Ok, got it. Will be interesting to see how kernel fuse changes look > > > to accomodate this variable sized nodeid. > > > > > > > It may make sense to have a FUSE protocol dialect where nodeid > > is variable size for all commands, but it probably won't be part of > > the initial LOOKUP_HANDLE work. > > > > > > > > > > That is what my libfuse_passthough POC does. > > > > > > Where have you hosted corresponding kernel changes? > > > > > > > There are no kernel changes. > > > > For xfs and ext4 I know how to implement open_by_ino() > > and I know how to parse the opaque fs file handle to extract > > ino+generation from it and return them in FUSE_LOOKUP > > response. > > Aha, interesting. So is this filesystem specific. Works on xfs/ext4 but > not necessarily on other filesystems like nfs. (Because they have their > own way of encoding things in file handle). Correct. If it is not xfs/ext4, the passthrough implementation uses open fds. In the non xfs/ext4 case, an NFS client may get ESTALE errors after server restart and worse - get the content of the wrong object if server has reused nodeid+genration after restart. > > > > > So I could manage to implement persistent NFS file handles > > over the existing FUSE protocol with 64bit node id. > > And that explains that why you did not have to make kernel changes. But > this does not allow sever to close the fd associate with nodeid? Or there > is a way for server to generate file handle and then call > open_by_handle_at(). > Yes, that's what the server does in case of ext4/xfs: https://github.com/amir73il/libfuse/blob/fuse_passthrough/passthrough/fuse_passthrough.cpp#L912 Thanks, Amir.