Re: Potential regression after fsnotify_nameremove() rework in 5.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 17, 2022 at 4:34 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
>
> On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote:
> > On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
> >> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> >>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
> >>>> Sorry to bring this up so late but we might have found a regression
> >>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> >>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> >>>> reproduced on v5.16.
> >>>>
> >>>> Some of our processes use inotify to watch for IN_DELETE events (for
> >>>> files on tmpfs mostly), and relied on the fact that once such events are
> >>>> received, the files they refer to have actually been unlinked and can't
> >>>> be open/read. So if and once open() succeeds then it is a new version of
> >>>> the file that has been recreated with new content.
> >>>>
> >>>> This was true and working reliably before 5.3, but changed after
> >>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> >>>> d_delete()") specifically. There is now a time window where a process
> >>>> receiving one of those IN_DELETE events may still be able to open the
> >>>> file and read its old content before it's really unlinked from the FS.
> >>>
> >>> This is a bit surprising to me.
> >>> Do you have a reproducer?
> >>
> >> Yeah, I was using the following one to bisect this. It will print a
> >> message every time it succeeds to read the file after receiving a
> >> IN_DELETE event when run with something like `mkdir /tmp/foo;
> >> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> >> on various systems after 5.3, even for different #define-parameters.
> >>
> >
> > I see yes, it's a race between fsnotify_unlink() and d_delete()
> > fsnotify_unlink() in explicitly required to be called before d_delete(), so
> > it has the d_inode information and that leaves a windows for opening
> > the file from cached dentry before d_delete().
> >
> > I would rather that we try to address this not as a regression until
> > there is proof of more users that expect the behavior you mentioned.
> > I would like to provide you an API to opt-in for this behavior, because
> > fixing it for everyone may cause other workloads to break.
> >
> > Please test the attached patch on top of v5.16 and use
> > IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.
> >
> > I am assuming that it would be possible for you to modify the application
> > and add the IN_EXCL_UNLINK flag and that your application does not
> > care about getting IN_OPEN events on unlinked files?
> >
> > My patch overloads the existing flag IN_EXCL_UNLINK with a new
> > meaning. It's a bit of a hack and we can use some other flag if we need to
> > but it actually makes some sense that an application that does not care for
> > events on d_unlinked() files will be guaranteed to not get those events
> > after getting an IN_DELETE event. It is another form of the race that you
> > described.
> >
> > Will that solution work out for you?
>
> Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our
> applications is fine indeed. I've tested the 5.16 patch on my laptop
> with the reproducer and can't reproduce the issue. I've also tried the
> 5.10 patch on our products and also stop seeing the issue both with
> the reproducer but also with our internal applications and test cases
> that made us look into this initially. So this looks like a good fix on
> our side at least.
>

I am glad the patch addresses your issue.
However, I am not sure if I should even post it upstream,
unless more people ask for it.

My point of view is that IN_DELETE does not have enough
information for an "invalidate file" message.
FAN_DELETE, otoh, with recently merged FAN_REPORT_TARGET_FID
includes an information record with the unique and non-reusable file id of the
unlinked inode.

That should allow your application to correctly invalidate the state files
that it accesses on kernel >= v5.17.

Jan, do you have a different opinion?

Thanks,
Amir.



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux