On Mon, Jan 17, 2022 at 4:34 AM Ivan Delalande <colona@xxxxxxxxxx> wrote: > > On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote: > > On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@xxxxxxxxxx> wrote: > >> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote: > >>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@xxxxxxxxxx> wrote: > >>>> Sorry to bring this up so late but we might have found a regression > >>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series > >>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be > >>>> reproduced on v5.16. > >>>> > >>>> Some of our processes use inotify to watch for IN_DELETE events (for > >>>> files on tmpfs mostly), and relied on the fact that once such events are > >>>> received, the files they refer to have actually been unlinked and can't > >>>> be open/read. So if and once open() succeeds then it is a new version of > >>>> the file that has been recreated with new content. > >>>> > >>>> This was true and working reliably before 5.3, but changed after > >>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of > >>>> d_delete()") specifically. There is now a time window where a process > >>>> receiving one of those IN_DELETE events may still be able to open the > >>>> file and read its old content before it's really unlinked from the FS. > >>> > >>> This is a bit surprising to me. > >>> Do you have a reproducer? > >> > >> Yeah, I was using the following one to bisect this. It will print a > >> message every time it succeeds to read the file after receiving a > >> IN_DELETE event when run with something like `mkdir /tmp/foo; > >> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably > >> on various systems after 5.3, even for different #define-parameters. > >> > > > > I see yes, it's a race between fsnotify_unlink() and d_delete() > > fsnotify_unlink() in explicitly required to be called before d_delete(), so > > it has the d_inode information and that leaves a windows for opening > > the file from cached dentry before d_delete(). > > > > I would rather that we try to address this not as a regression until > > there is proof of more users that expect the behavior you mentioned. > > I would like to provide you an API to opt-in for this behavior, because > > fixing it for everyone may cause other workloads to break. > > > > Please test the attached patch on top of v5.16 and use > > IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing. > > > > I am assuming that it would be possible for you to modify the application > > and add the IN_EXCL_UNLINK flag and that your application does not > > care about getting IN_OPEN events on unlinked files? > > > > My patch overloads the existing flag IN_EXCL_UNLINK with a new > > meaning. It's a bit of a hack and we can use some other flag if we need to > > but it actually makes some sense that an application that does not care for > > events on d_unlinked() files will be guaranteed to not get those events > > after getting an IN_DELETE event. It is another form of the race that you > > described. > > > > Will that solution work out for you? > > Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our > applications is fine indeed. I've tested the 5.16 patch on my laptop > with the reproducer and can't reproduce the issue. I've also tried the > 5.10 patch on our products and also stop seeing the issue both with > the reproducer but also with our internal applications and test cases > that made us look into this initially. So this looks like a good fix on > our side at least. > I am glad the patch addresses your issue. However, I am not sure if I should even post it upstream, unless more people ask for it. My point of view is that IN_DELETE does not have enough information for an "invalidate file" message. FAN_DELETE, otoh, with recently merged FAN_REPORT_TARGET_FID includes an information record with the unique and non-reusable file id of the unlinked inode. That should allow your application to correctly invalidate the state files that it accesses on kernel >= v5.17. Jan, do you have a different opinion? Thanks, Amir.