Re: Potential regression after fsnotify_nameremove() rework in 5.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 17, 2022 at 4:21 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Mon 17-01-22 15:14:53, Amir Goldstein wrote:
> > On Mon, Jan 17, 2022 at 4:34 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
> > >
> > > On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote:
> > > > On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
> > > >> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> > > >>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
> > > >>>> Sorry to bring this up so late but we might have found a regression
> > > >>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> > > >>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> > > >>>> reproduced on v5.16.
> > > >>>>
> > > >>>> Some of our processes use inotify to watch for IN_DELETE events (for
> > > >>>> files on tmpfs mostly), and relied on the fact that once such events are
> > > >>>> received, the files they refer to have actually been unlinked and can't
> > > >>>> be open/read. So if and once open() succeeds then it is a new version of
> > > >>>> the file that has been recreated with new content.
> > > >>>>
> > > >>>> This was true and working reliably before 5.3, but changed after
> > > >>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> > > >>>> d_delete()") specifically. There is now a time window where a process
> > > >>>> receiving one of those IN_DELETE events may still be able to open the
> > > >>>> file and read its old content before it's really unlinked from the FS.
> > > >>>
> > > >>> This is a bit surprising to me.
> > > >>> Do you have a reproducer?
> > > >>
> > > >> Yeah, I was using the following one to bisect this. It will print a
> > > >> message every time it succeeds to read the file after receiving a
> > > >> IN_DELETE event when run with something like `mkdir /tmp/foo;
> > > >> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> > > >> on various systems after 5.3, even for different #define-parameters.
> > > >>
> > > >
> > > > I see yes, it's a race between fsnotify_unlink() and d_delete()
> > > > fsnotify_unlink() in explicitly required to be called before d_delete(), so
> > > > it has the d_inode information and that leaves a windows for opening
> > > > the file from cached dentry before d_delete().
> > > >
> > > > I would rather that we try to address this not as a regression until
> > > > there is proof of more users that expect the behavior you mentioned.
> > > > I would like to provide you an API to opt-in for this behavior, because
> > > > fixing it for everyone may cause other workloads to break.
> > > >
> > > > Please test the attached patch on top of v5.16 and use
> > > > IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.
> > > >
> > > > I am assuming that it would be possible for you to modify the application
> > > > and add the IN_EXCL_UNLINK flag and that your application does not
> > > > care about getting IN_OPEN events on unlinked files?
> > > >
> > > > My patch overloads the existing flag IN_EXCL_UNLINK with a new
> > > > meaning. It's a bit of a hack and we can use some other flag if we need to
> > > > but it actually makes some sense that an application that does not care for
> > > > events on d_unlinked() files will be guaranteed to not get those events
> > > > after getting an IN_DELETE event. It is another form of the race that you
> > > > described.
> > > >
> > > > Will that solution work out for you?
> > >
> > > Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our
> > > applications is fine indeed. I've tested the 5.16 patch on my laptop
> > > with the reproducer and can't reproduce the issue. I've also tried the
> > > 5.10 patch on our products and also stop seeing the issue both with
> > > the reproducer but also with our internal applications and test cases
> > > that made us look into this initially. So this looks like a good fix on
> > > our side at least.
> > >
> >
> > I am glad the patch addresses your issue.
> > However, I am not sure if I should even post it upstream,
> > unless more people ask for it.
> >
> > My point of view is that IN_DELETE does not have enough
> > information for an "invalidate file" message.
> > FAN_DELETE, otoh, with recently merged FAN_REPORT_TARGET_FID
> > includes an information record with the unique and non-reusable file id of the
> > unlinked inode.
> >
> > That should allow your application to correctly invalidate the state files
> > that it accesses on kernel >= v5.17.
> >
> > Jan, do you have a different opinion?
>
> Yeah, I was thinking about this. I don't quite like your hack with inotify
> flag. Firstly, it requires cooperation from userspace (setting the flag),
> secondly, d_drop() in fsnotify code is unexpected and ugly on the kernel
> side, and overall adding yet another special case to fsnotify code is not
> very compelling either.
>
> I agree transitioning to fanotify may be a nice solution for the
> application but I'm not sure how viable that is short term (requiring very
> new kernel, maybe non-trivial cost of porting the application to fanotify).
> Since this fully lies within the "we do not regress userspace" boundaries -
> I'm not surprised the application does not expect to see a file for which
> it got IN_DELETE event - I guess we should solve this transparently within
> the kernel if we can. So far we've got only one report but I'd say there
> are other applications like this out there, just they didn't transition to
> new enough kernel yet or were lucky enough to not hit the problem yet.
>
> One possibility I can see is: Add fsnotify primitive to create the event,
> just not queue it in the notification queue yet (essentially we would
> cut-short the event handling before calling fsnotify_insert_event() /
> fsnotify_add_event()), only return it. Then another primitive would be for
> queueing already prepared event. Then the sequence for unlink would be:
>
>         LIST_HEAD(event_list);
>
>         fsnotify_events_prepare(&event_list, ...);
>         d_delete(dentry);
>         fsnotify_events_report(&event_list);
>
> And we can optionally wrap this inside d_delete_notify() to make it easier
> on the callers. What do you think?
>

I think it sounds like the "correct" design, but also sounds like a
big change that
is not so practical for backporting.

Given that this is a regression that goes way back, backportability
plays a role.
Also, a big change like this needs developer time, which I myself don't have
at the moment.

For a simpler backportable solution, instead of preparing the event
perhaps it is enough that we ihold() the inode until after fsnotify_unlink()
and pass it as an argument very similar to fsnotify_link().

The question is how to ihold() the inode only if we are going to queue
an IN_DELETE event? Maybe send an internal FS_PRE_DELETE
event?

I am currently out of better ideas.

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux