On Mon, Sep 12, 2022 at 5:35 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > On Mon, Sep 12, 2022 at 04:38:48PM +0300, Amir Goldstein wrote: > > On Mon, Sep 12, 2022 at 4:16 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote: > > > > > > On Sun, Sep 11, 2022 at 01:14:49PM +0300, Amir Goldstein wrote: > > > > On Wed, Sep 23, 2020 at 10:44 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > > > > > > > One proposal was to add LOOKUP_HANDLE operation that is similar to > > > > > LOOKUP except it takes a {variable length handle, name} as input and > > > > > returns a variable length handle *and* a u64 node_id that can be used > > > > > normally for all other operations. > > > > > > > > > > The advantage of such a scheme for virtio-fs (and possibly other fuse > > > > > based fs) would be that userspace need not keep a refcounted object > > > > > around until the kernel sends a FORGET, but can prune its node ID > > > > > based cache at any time. If that happens and a request from the > > > > > client (kernel) comes in with a stale node ID, the server will return > > > > > -ESTALE and the client can ask for a new node ID with a special > > > > > lookup_handle(fh, NULL). > > > > > > > > > > Disadvantages being: > > > > > > > > > > - cost of generating a file handle on all lookups > > > > > - cost of storing file handle in kernel icache > > > > > > > > > > I don't think either of those are problematic in the virtiofs case. > > > > > The cost of having to keep fds open while the client has them in its > > > > > cache is much higher. > > > > > > > > > > > > > I was thinking of taking a stab at LOOKUP_HANDLE for a generic > > > > implementation of persistent file handles for FUSE. > > > > > > Hi Amir, > > > > > > I was going throug the proposal above for LOOKUP_HANDLE and I was > > > wondering how nodeid reuse is handled. > > > > LOOKUP_HANDLE extends the 64bit node id to be variable size id. > > Ok. So this variable size id is basically file handle returned by > host? > > So this looks little different from what Miklos had suggested. IIUC, > he wanted LOOKUP_HANDLE to return both file handle as well as *node id*. > > ********************************* > One proposal was to add LOOKUP_HANDLE operation that is similar to > LOOKUP except it takes a {variable length handle, name} as input and > returns a variable length handle *and* a u64 node_id that can be used > normally for all other operations. > *************************************** > Ha! Thanks for reminding me about that. It's been a while since I looked at what actually needs to be done. That means that evicting server inodes from cache may not be as easy as I had imagined. > > A server that declares support for LOOKUP_HANDLE must never > > reuse a handle. > > > > That's the basic idea. Just as a filesystem that declares to support > > exportfs must never reuse a file handle. > > > > > > IOW, if server decides to drop > > > nodeid from its cache and reuse it for some other file, how will we > > > differentiate between two. Some sort of generation id encoded in > > > nodeid? > > > > > > > That's usually the way that file handles are implemented in > > local fs. The inode number is the internal lookup index and the > > generation part is advanced on reuse. > > > > But for passthrough fs like virtiofsd, the LOOKUP_HANDLE will > > just use the native fs file handles, so virtiofsd can evict the inodes > > entry from its cache completely, not only close the open fds. > > Ok, got it. Will be interesting to see how kernel fuse changes look > to accomodate this variable sized nodeid. > It may make sense to have a FUSE protocol dialect where nodeid is variable size for all commands, but it probably won't be part of the initial LOOKUP_HANDLE work. > > > > That is what my libfuse_passthough POC does. > > Where have you hosted corresponding kernel changes? > There are no kernel changes. For xfs and ext4 I know how to implement open_by_ino() and I know how to parse the opaque fs file handle to extract ino+generation from it and return them in FUSE_LOOKUP response. So I could manage to implement persistent NFS file handles over the existing FUSE protocol with 64bit node id. Thanks, Amir.