Re: [PATCH v13 00/10] fuse: Add support for passthrough read/write

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 21 Sep 2023 10:33:21 +0300

On Wed, Sep 20, 2023 at 9:15 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Wed, 20 Sept 2023 at 15:57, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> > Unlike Alession's version, this implementation uses the explicit ioctl
> > flag FUSE_BACKING_MAP_ID to setup an "inode unbound" backing file
> > and the explicit open flag FOPEN_PASSTHROUGH_AUTO_CLOSE
> > to unmap the backing id, when the backing file is assigned to the fuse file.
>
> That sounds about right.  I'm not sure the flexibility provided by the
> auto/noauto mode is really needed, but taking this out later in the
> development process shouldn't be a problem.
>
> >
> > FOPEN_PASSTHROUGH (without AUTO_CLOSE) would let the server
> > manage the backing ids, but that mode is not implemented in the example.
> >
> > The seconds passthrough_hp commit:
> > - Enable passthrough of multiple fuse files to the same backing file
> >
> > Maps all the rdonly FUSE files to the same "inode bound" backing
> > file that is setup on the first rdonly open of an inode.
> > The implementation uses the ioctl flag FUSE_BACKING_MAP_INODE
> > and the open flag FOPEN_PASSTHROUGH (AUTO_CLOSE not
> > supported in this mode).
>
> I think the natural interface would be to setup the inode mapping in
> the lookup reply (using the normal backing ID).
>
> With your current method, what's the good time for establishing the
> mapping?  After the lookup succeeded, obviously.  But then it might
> already have raced with open...
>

I don't understand the concern.

This API leaves the control of when to setup/teardown the
mapping completely in the hands of the server.

Avoiding races in the "inode bound" mode is also the responsibility
of the server.

Quoting from the kernel commit of FUSE_BACKING_MAP_INODE:

"If an inode bound backing file already exists, the ioctl returns -EEXIST.
 Managing which thread sets up the backing file for concurrent file open
 requests is the responsibility of the server.
...
 The FUSE server may call FUSE_DEV_IOC_BACKING_CLOSE ioctl to break
 the association between the backing file and the inode.

 If there is no inode bound backing file, the ioctl returns -ENOENT.
 Managing which thread detaches the backing file is the responsibility of
 the server."

In my example, the server happens to setup the mapping
of the backing file to inode
*before the first open for read on the inode*
and to teardown the mapping
*after the last close on the inode* (not even last rdonly file close)
but this is an arbitrary implementation choice of the server.

This example server has a lock on the server inode object,
which is already used among other things to safely track the number
of open fd on that inode (inode.nopen), so it was easy to
implement the map/unmap in these code points.

> >
> > The "inode bound" shared backing file is released of inode evict
> > of course, but the example explicitly unmaps the inode bound
> > backing file on close of the last open file of an inode.
> >
> > Writable files still use the per-open unmap-on-release mode.
>
> What's the reason for using different modes on different types of opens?
>

The reason is to demonstrate that different FUSE files can use
different backing file modes and that several FUSE files can share
the same backing file.

In the "inode bound" mode, the backing file is *shared* among all
openers of the inode that set backing_id = 0 in the open reply.

If I wanted to demo share the same backing file for readers and writers
I would have needed to re-open the backing file with O_RDWR
on the first rdonly open.

It was convenient for the purpose of the demo to share the rdonly
file from the first rdonly open, among all readers. This demo is
not 100% correct in the sense that if the first open for read
has O_NOATIME and the other open for read do not, all will
use the backing file with O_NOATIME, but this is a minor
implementation detail that the server is fully able to handle.

>From your questions, it seems that you either do not like my
proposal of the special backing_id 0, or maybe you missed it
because I did not point it out.

The current kernel implementation does NOT refcount the
backing_id only the fuse_backing object.

The way that auto-close-on-evict is implemented is simply by
not having a backing_id for the "inode bound" backing files.

The way that auto-close-on-fput is implemented is simply by
handing over the backing_id from server to kernel on open
reply, which unmaps the backing_id and atomically transfers the
ownership of the fuse_backing object to the fuse file.

This is not the most powerful API, but
1. It is simple
2. It can be extended later (i.e. by allowing both
    FUSE_BACKING_MAP_{INODE,ID} or another flag)

Is that clear? Do I need to document it better?
Do you find this API useful/acceptable?

If not, I can drop the FUSE_BACKING_MAP_INODE mode
patch and implement the server managed backing ids mode
in the example (by storing the backing_id in the server inode).

Thanks,
Amir.