Re: [LSF/MM/BPF TOPIC] Allowing linkat() to replace the destination

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 22, 2020 at 1:05 AM Omar Sandoval <osandov@xxxxxxxxxxx> wrote:
>
> On Sat, Jan 18, 2020 at 02:20:32AM +0000, Al Viro wrote:
> > On Fri, Jan 17, 2020 at 05:17:34PM -0800, Omar Sandoval wrote:
> > > > No.  This is completely wrong; just make it ->link_replace() and be done
> > > > with that; no extra arguments and *always* the same conditions wrt
> > > > positive/negative.  One of the reasons why ->rename() tends to be
> > > > ugly (and a source of quite a few bugs over years) are those "if
> > > > target is positive/if target is negative" scattered over the instances.
> > > >
> > > > Make the choice conditional upon the positivity of target.
> > >
> > > Yup, you already convinced me that ->link_replace() is better in your
> > > last email.
> >
> > FWIW, that might be not so simple ;-/  Reason: NFS-like stuff.  Client
> > sees a negative in cache; the problem is how to decide whether to
> > tell the server "OK, I want normal link()" vs. "if it turns out that
> > someone has created it by the time you see the request, give do
> > a replacing link".  Sure, if could treat ->link() telling you -EEXIST
> > as "OK, repeat it with ->link_replace(), then", but that's an extra
> > roundtrip...
>
> So that's a point in favor of ->link(). But then if we overload ->link()
> instead of adding ->link_replace() and we want EOPNOTSUPP to fail fast,
> we need to add something like FMODE_SUPPORTS_AT_REPLACE.
>
> Some options I see are:
>
> 1. Go with ->link_replace() until network filesystem specs support
>    AT_REPLACE. That would be a bit of a mess down the line, though.
> 2. Stick with ->link(), let the filesystem implementations deal with the
>    positive targets, and add FMODE_SUPPORTS_AT_REPLACE so that feature
>    detection remains easy for userspace.

"detection remains easy..." why is this important?
Do you know of a userspace application that would have a problem checking
if AT_REPLACE works, fall back to whatever and never try it ever again?
Besides, when said application tried to open an O_TMPFILE and fail, it
will have already detected a lot of the unsupported cases.
Sorry for not reading all the thread again, some API questions:
- We intend to allow AT_REPLACE only with O_TMPFILE src. Right?
- Does AT_REPLACE assert that destination is positive? and if so why?
The functionality that is complement to atomic rename would be atomic
link, destination could be positive or negative, but end results will be
that destination is positive with new inode.
With those semantics, ->link_replace() makes much less sense IMO.

> 3. Option 2, but don't bother with FMODE_SUPPORTS_AT_REPLACE.
>
> FWIW, there is precendent for option 3: RENAME_EXCHANGE. That has the
> same "files are the same" noop condition, and we don't know whether
> RENAME_EXCHANGE is supported until ->rename(). A cursory search shows
> that applications using RENAME_EXCHANGE try it and fall back to a
> non-atomic exchange on EINVAL. They could do the exact same thing for
> AT_REPLACE.
>

That sounds like the most reasonable approach to me. Let's not over complicate.
If you find that creates too much generic logic in ->link(), you can take
the approach Darrick employed with generic_remap_file_range_prep() for
filesystems that want to support AT_REPLACE. All other fs just need to check
for valid flags mask, like the ->rename() precedent.

Another side discussion about passing AT_ flags down to filesystems.
Traditionally, that was never done, until AT_STATX_ mixed vfs flags
with filesystem flags on the same AT_ namespace.
Now we have linkat() syscall that can take only AT_ vfs flags and
renameat2() syscall that can take only RENAME_ filesystem flags not
from the AT_ namespace.
I feel that the situation of having AT_REPLACE API along with
RENAME_EXCHANGE and RENAME_NOREPLACE is a bit awkward
and some standardization is in order.

According to include/uapi/linux/fcntl.h, there is no numeric collision
between the RENAME_ flag namepsace and AT_ flags namespace,
although I do find it suspicious that AT_ flags start at 0x100...
Could we define AT_RENAME_xxx RENAME_xxx flags and name the
new flag AT_LINK_REPLACE, so it is a bit more clear that the flag
is specific to link(2) syscall and not vfs generic AT_ flag.

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux