Re: Potential regression after fsnotify_nameremove() rework in 5.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
>
> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> > On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@xxxxxxxxxx> wrote:
> >> Sorry to bring this up so late but we might have found a regression
> >> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> >> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> >> reproduced on v5.16.
> >>
> >> Some of our processes use inotify to watch for IN_DELETE events (for
> >> files on tmpfs mostly), and relied on the fact that once such events are
> >> received, the files they refer to have actually been unlinked and can't
> >> be open/read. So if and once open() succeeds then it is a new version of
> >> the file that has been recreated with new content.
> >>
> >> This was true and working reliably before 5.3, but changed after
> >> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> >> d_delete()") specifically. There is now a time window where a process
> >> receiving one of those IN_DELETE events may still be able to open the
> >> file and read its old content before it's really unlinked from the FS.
> >
> > This is a bit surprising to me.
> > Do you have a reproducer?
>
> Yeah, I was using the following one to bisect this. It will print a
> message every time it succeeds to read the file after receiving a
> IN_DELETE event when run with something like `mkdir /tmp/foo;
> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> on various systems after 5.3, even for different #define-parameters.
>

I see yes, it's a race between fsnotify_unlink() and d_delete()
fsnotify_unlink() in explicitly required to be called before d_delete(), so
it has the d_inode information and that leaves a windows for opening
the file from cached dentry before d_delete().

I would rather that we try to address this not as a regression until
there is proof of more users that expect the behavior you mentioned.
I would like to provide you an API to opt-in for this behavior, because
fixing it for everyone may cause other workloads to break.

Please test the attached patch on top of v5.16 and use
IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.

I am assuming that it would be possible for you to modify the application
and add the IN_EXCL_UNLINK flag and that your application does not
care about getting IN_OPEN events on unlinked files?

My patch overloads the existing flag IN_EXCL_UNLINK with a new
meaning. It's a bit of a hack and we can use some other flag if we need to
but it actually makes some sense that an application that does not care for
events on d_unlinked() files will be guaranteed to not get those events
after getting an IN_DELETE event. It is another form of the race that you
described.

Will that solution work out for you?

> >> I'm not very familiar with the VFS and fsnotify internals, would you
> >> consider this a regression, or was there never any intentional guarantee
> >> for that behavior and it's best we work around this change in userspace?
> >
> > It may be a regression if applications depend on behavior that changed,
> > but if are open for changes in your application please describe in more details
> > what it tries to accomplish using IN_DELETE and the ekernel your application
> > is running on and then I may be able to recommend a more reliable method.
>
> I can discuss it with our team and get more details on this but it may
> be pretty complicated and costly to change. My understanding is that
> these watched files are used as ID/version references for in-memory
> objects shared between multiple processes to synchronize state, and
> resynchronize when there are crashes or restarts. So they can be
> recreated in place with the same or a different content, and so their
> inode number or mtime etc. may not be useable as additonnal check.
>
> I think we can also have a very large number of these objects and files
> on some configurations, so waiting to see if we have a following
> IN_CREATE, or adding more checks/synchronization logic will probably
> have a significant impact at scale.
>
> And we're targeting 5.10 and building it with our own config.
>

I am not convinced that this is needed as a regression fix for stable
kernels, but since you are building your own kernel I can help you do the
backport - attached *untested* patch for v5.10 that also backport of
"fsnotify: pass dentry instead of inode data".

Thanks,
Amir.
From 73c137d6a4cab445dec76c731b2c763d45af068e Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@xxxxxxxxx>
Date: Thu, 4 Feb 2021 09:11:11 +0200
Subject: [PATCH] inotify: optionally invalidate dcache before IN_DELETE event

Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY.
Use it to pass the dentry instead of d_inode, to allow d_drop()
before IN_DELETE event.

Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
---
 fs/notify/fsnotify.c             |  2 ++
 include/linux/fsnotify.h         |  4 ++--
 include/linux/fsnotify_backend.h | 16 ++++++++++++++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 8d3ad5ef2925..55fb96c44449 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -260,6 +260,8 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask,
 		child_mark = NULL;
 		dir = NULL;
 		name = NULL;
+	} else if (inode_mark->mask & FS_EXCL_UNLINK && mask & FS_DELETE) {
+		d_drop(fsnotify_data_dentry(data, data_type));
 	}
 
 	ret = ops->handle_inode_event(inode_mark, mask, inode, dir, name);
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index f8acddcf54fb..aa35f261cc3d 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -36,7 +36,7 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask,
 static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 				   __u32 mask)
 {
-	fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
+	fsnotify(mask, dentry, FSNOTIFY_EVENT_DENTRY, dir, &dentry->d_name, NULL, 0);
 }
 
 static inline void fsnotify_inode(struct inode *inode, __u32 mask)
@@ -77,7 +77,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
  */
 static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 {
-	fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE);
+	fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY);
 }
 
 static inline int fsnotify_file(struct file *file, __u32 mask)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f8529a3a2923..c67d0d74254e 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -250,6 +250,7 @@ enum fsnotify_data_type {
 	FSNOTIFY_EVENT_NONE,
 	FSNOTIFY_EVENT_PATH,
 	FSNOTIFY_EVENT_INODE,
+	FSNOTIFY_EVENT_DENTRY,
 };
 
 static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
@@ -257,6 +258,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 	switch (data_type) {
 	case FSNOTIFY_EVENT_INODE:
 		return (struct inode *)data;
+	case FSNOTIFY_EVENT_DENTRY:
+		return d_inode(data);
 	case FSNOTIFY_EVENT_PATH:
 		return d_inode(((const struct path *)data)->dentry);
 	default:
@@ -264,6 +267,19 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 	}
 }
 
+static inline struct dentry *fsnotify_data_dentry(const void *data, int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_DENTRY:
+		/* Non const is needed for dget() */
+		return (struct dentry *)data;
+	case FSNOTIFY_EVENT_PATH:
+		return ((const struct path *)data)->dentry;
+	default:
+		return NULL;
+	}
+}
+
 static inline const struct path *fsnotify_data_path(const void *data,
 						    int data_type)
 {
-- 
2.16.5

From d4d91d5b46c13def48b36bb7734b5fca8e1c2127 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@xxxxxxxxx>
Date: Sun, 16 Jan 2022 10:11:45 +0200
Subject: [PATCH] inotify: optionally invalidate dcache before IN_DELETE event

Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.

Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete()
intentionally, so fsnotify will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

In order to provide the "invalidate" functionality without changing
behavior for other workload, overload the original meaning of the
IN_EXCL_UNLINK flag with the additional meaning that the file should
not be accessed via that name after receiving an IN_DELETE event and
drop dentry from cache before IN_DELETE is reported.

This will allow applications to opt-in for the "invalidate" behavior.

Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
---
 fs/notify/fsnotify.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 4034ca566f95..86337220f25c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -255,9 +255,21 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group,
 	if (WARN_ON_ONCE(!inode && !dir))
 		return 0;
 
-	if ((inode_mark->mask & FS_EXCL_UNLINK) &&
-	    path && d_unlinked(path->dentry))
-		return 0;
+	if (inode_mark->mask & FS_EXCL_UNLINK) {
+		if (path && d_unlinked(path->dentry))
+			return 0;
+
+		/*
+		 * File may be unlinked in filesystem, but still accesible via
+		 * dcache, so user may be able to observe and open the file
+		 * after receiving an IN_DELETE event.
+		 * Overload the original meaning of the IN_EXCL_UNLINK flag
+		 * with the additional meaning that the file cannot be accessed
+		 * via that name after receiving an IN_DELETE event.
+		 */
+		if (mask & FS_DELETE)
+			d_drop(fsnotify_data_dentry(data, data_type));
+	}
 
 	/* Check interest of this mark in case event was sent with two marks */
 	if (!(mask & inode_mark->mask & ALL_FSNOTIFY_EVENTS))
-- 
2.34.1


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux