Re: [PATCH v4 1/3] FUSE: Implement atomic lookup + create

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 05, 2022 at 10:21:21AM +0530, Dharmendra Hans wrote:
> On Wed, May 4, 2022 at 8:17 PM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> >
> > On Wed, May 04, 2022 at 09:56:49AM +0530, Dharmendra Hans wrote:
> > > On Wed, May 4, 2022 at 1:24 AM Vivek Goyal <vgoyal@xxxxxxxxxx> wrote:
> > > >
> > > > On Mon, May 02, 2022 at 03:55:19PM +0530, Dharmendra Singh wrote:
> > > > > From: Dharmendra Singh <dsingh@xxxxxxx>
> > > > >
> > > > > When we go for creating a file (O_CREAT), we trigger
> > > > > a lookup to FUSE USER SPACE. It is very  much likely
> > > > > that file does not exist yet as O_CREAT is passed to
> > > > > open(). This lookup can be avoided and can be performed
> > > > > as part of create call into libfuse.
> > > > >
> > > > > This lookup + create in single call to libfuse and finally
> > > > > to USER SPACE has been named as atomic create. It is expected
> > > > > that USER SPACE create the file, open it and fills in the
> > > > > attributes which are then used to make inode stand/revalidate
> > > > > in the kernel cache. Also if file was newly created(does not
> > > > > exist yet by this time) in USER SPACE then it should be indicated
> > > > > in `struct fuse_file_info` by setting a bit which is again used by
> > > > > libfuse to send some flags back to fuse kernel to indicate that
> > > > > that file was newly created. These flags are used by kernel to
> > > > > indicate changes in parent directory.
> > > >
> > > > Reading the existing code a little bit more and trying to understand
> > > > existing semantics. And that will help me unerstand what new is being
> > > > done.
> > > >
> > > > So current fuse_atomic_open() does following.
> > > >
> > > > A. Looks up dentry (if d_in_lookup() is set).
> > > > B. If dentry is positive or O_CREAT is not set, return.
> > > > C. If server supports atomic create + open, use that to create file and
> > > >    open it as well.
> > > > D. If server does not support atomic create + open, just create file
> > > >    using "mknod" and return. VFS will take care of opening the file.
> > > >
> > > > Now with this patch, new flow is.
> > > >
> > > > A. Look up dentry if d_in_lookup() is set as well as either file is not
> > > >    being created or fc->no_atomic_create is set. This basiclally means
> > > >    skip lookup if atomic_create is supported and file is being created.
> > > >
> > > > B. Remains same. if dentry is positive or O_CREATE is not set, return.
> > > >
> > > > C. If server supports new atomic_create(), use that.
> > > >
> > > > D. If not, if server supports atomic create + open, use that
> > > >
> > > > E. If not, fall back to mknod and do not open file.
> > > >
> > > > So to me this new functionality is basically atomic "lookup + create +
> > > > open"?
> > > >
> > > > Or may be not. I see we check "fc->no_create" and fallback to mknod.
> > > >
> > > >         if (fc->no_create)
> > > >                 goto mknod;
> > > >
> > > > So fc->no_create is representing both old atomic "create + open" as well
> > > > as new "lookup + create + open" ?
> > > >
> > > > It might be obvious to you, but it is not to me. So will be great if
> > > > you shed some light on this.
> > > >
> > >
> > > I think you got it right now. New atomic create does what you
> > > mentioned as new flow.  It does  lookup + create + open in single call
> > > (being called as atomic create) to USER SPACE.mknod is a special case
> >
> > Ok, naming is little confusing. I think we will have to put it in
> > commit message and where you define FUSE_ATOMIC_CREATE that what's
> > the difference between FUSE_CREATE and FUSE_ATOMIC_CREATE. This is
> > ATOMIC w.r.t what?
> 
> Sure, I would update the commit message to make the distinction clear
> between the two. This operation is atomic w.r.t to USER SPACE FUSE
> implementations. i.e USER SPACE would be performing all these
> operations in a single call to it.

I think even FUSE_CREAT is doing same thing. Creating file, opening and
doing lookup and sending all the data. So that's not the difference
between the two, IMHO. And that's why I am getting confused with the
naming.

>From user space file server perspective, only extra operation seems
to be that it sends a flag in response telling the client whether
file was actually created or it already existed. So to me it just
sounds little extension of existing FUSE_CREATE command and that's
why I thought calling it FUSE_CREATE_EXT is probably better naming.


> 
> 
> > May be atomic here means that "lookup + create + open" is a single operation.
> > But then even FUSE_CREATE is atomic because "creat + open" is a single
> > operation.
> >
> > In fact FUSE_CREATE does lookup anyway and returns all the information
> > in fuse_entry_out.
> >
> > IIUC, only difference between FUSE_CREATE and FUSE_ATOMIC_CREATE is that
> > later also carries information in reply whether file was actually created
> > or not (FOPEN_FILE_CREATED). This will be set if file did not exist
> > already and it was created indeed. Is that right?
> 
> FUSE_CREATE is atomic but upto level of libfuse. Libfuse separates it
> into two calls, create and lookup separately into USER SPACE FUSE
> implementation.

I am not sure what do you mean by "libfuse separates it into two calls,
create and lookup separately". I guess you are referring to lo_create()
in example/passthrough_ll.c which first creates and opens file and
then looks it up and replies.

        fd = openat(lo_fd(req, parent), name,
                    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
	err = lo_do_lookup(req, parent, name, &e);
	fuse_reply_create(req, &e, fi);

I am looking at your proposal for atomic_create implementation here.

https://github.com/libfuse/libfuse/pull/673/commits/88cd25b2857f2bb213d01afbcfd666787d1e6893#diff-a36385ec8fb753d6f4492a5f0d3c6a5750bd370b50df6ef0610efdcd3f8880ffR787

It is doing exactly same thing as lo_create(), except one difference that
it is checking first if file exists. It essentially is doing this.

A. newfd = openat(lo_fd(req, parent), name, O_PATH | O_NOFOLLOW);
B. fd = openat(lo_fd(req, parent), name,
		    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);
C. err = lo_do_lookup(req, parent, name, &e);
D. fuse_reply_create(req, &e, fi);

So what do you mean by libfuse makes it two calls. 

And I think above implementation is racy. What if filesystem is
shared and another client creates the file between calls to
A and B. You will think you created the file but some other
client created it.

So if intent is to know whether we created the file or not, then
you should probably do openat() with O_EXCL flag. If that succeeds
you know you created the file. If it fails with -EEXIST, then you
know file is already there. That's what virtiofs does.

Anyway, coming back to the point. IMHO, from server perspective,
there is no atomicity difference between FUSE_CREATE and
FUSE_ATOMIC_CREATE. Only difference seems to be to send addditional
information back to the client to tell it whether file was created
or not.

In fact for shared filesystem this is probably a problem. What if
guest's cache is stale and it does not know about the file. A client
B creates the file and we think we did not create the file. And we
will return with FOPEN_FILE_CREATED = 0. And in that case client
will not call fuse_dir_changed(). But that seems wrong in case
of shared filesystems. I am concerned about virtiofs which can
be shared between different guests.

Miklos, WDYT?

May be it is not a huge concern. If one guest drops a file, other guest
will not invalidate its dir attrs till timeout happens. Case of shared
filesystem is very tricky with fuse. And sometimes it is not clear
to me what kind of coherency matters.

So in this case say I am booted with cache=auto, if guest B drops a file
bar/foo.txt and guest A does open(bar/foo.txt, O_CREAT), then should
guest A invalidate the attrs of bar/ right away or it will be invalidated
anyway after a second. 

Anyway.., my core point is that difference between FUSE_CREATE and
FUSE_ATOMIC_CREATE is just one flag FOPEN_FILE_CREATED which tells
client whether file was actually created or not. And that is used
to determine whether to invalidate parent dir attributes or not. It
does not have anything extra in terms of ATOMICITY as far as I can
see and that's what confuses me.



> This FUSE_ATOMIC_CREATE does all these ops in a single call to FUSE
> implementations.  We do not want to break any existing FUSE
> implementations, therefore it is being introduced as a new feature. I
> forgot to include links to libfuse patches as well. That would have
> made it much clearer.  Here is the link to libfuse patch for this call
> https://github.com/libfuse/libfuse/pull/673.
> 
> >
> > I see FOPEN_FILE_CREATED is being used to avoid calling
> > fuse_dir_changed(). That sounds like a separate optimization and probably
> > should be in a separate patch.
> 
> FUSE_ATOMIC_CREATE needs to send back info about if  file was actually
> created or not (This is suggestion from Miklos) to correctly convey if
> the parent dir is really changing or not. I included this as part of
> this patch itself instead of having it as a separate patch.

This needs little more thought w.r.t shared filesystems.

> 
> > IOW, I think this patch should be broken in to multiple pieces. First
> > piece seems to be avoiding lookup() and given the way it is implemented,
> > looks like we can avoid lookup() even by using existing FUSE_CREATE
> > command. We don't necessarily need FUSE_ATOMIC_CREATE. Is that right?
> 
> Its not only about changing fuse kernel code but USER SPACE
> implementations also. If we change the you are suggesting we would be
> required to twist many things at libfuse and FUSE low level API. So to
> keep things simple and not to break any existing implementations we
> have kept it as new call (we now pass `struct stat` to USER SPACE FUSE
> to filled in).
> 
> > And once that is done, a separate patch should probably should explain
> > the problem and say fuse_dir_changed() call can be avoided if we knew
> > if file was actually created or it was already existing there. And that's
> > when one need to introduce a new command. Given this is just an extension
> > of existing FUSE_CREATE command and returns additiona info about
> > FOPEN_FILE_CREATED, we probably should simply call it FUSE_CREATE_EXT
> > and explain how this operation is different from FUSE_CREATE.
> 
> As explained above, we are not doing this way as we have kept in mind
> all existing libfuse APIs as well.
>

I am not seeing what will break in existing passthrough_ll.c if I
simply used FUSE_CREATE for a file which already exists.

        fd = openat(lo_fd(req, parent), name,
                    (fi->flags | O_CREAT) & ~O_NOFOLLOW, mode);

This call should still succeed even if file already exists. (until and
unless called it with O_EXCL and in that case failing is the correct
behavior). 

Thanks
Vivek




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux