Re: [PATCH v2] ovl: require xwhiteout feature flag on layer roots

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 19, 2024 at 10:30 PM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Fri, 19 Jan 2024 at 20:06, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
>
> > How about checking xwhiteouts xattrs along with impure and
> > origin xattrs in ovl_get_inode()?
> >

That was not a very good suggestion on my part.
ovl_get_inode() only checks impure xattr on upper dir and origin xattr
on non-merge non-multi lower dir.

If we change the location of xwhiteout xattr check, it should be in
ovl_lookup_single() next to checking opaque xattr, which makes
me think - hey, why don't we overload opaque xattr, just like we
did with metacopy xattr?

An overlay.opaque xattr with empty string means "may have xwhiteouts"
and that is backward compatible, because ovl_is_opaquedir() checks
for xattr of length 1.

The only extra getxattr() needed would be for the d->last case...

> > Then there will be no overhead in readdir and no need for
> > marking the layer root?
> >
> > Miklos, would that be acceptable?
>
> It's certainly a good idea, but doesn't really address my worry.  The
> minor performance impact is not what bothers me most.  It's the fact
> that in the common case the result of these calls are discarded.
> That's just plain ugly, IMO.

...so the question boils down to, whether you find it too ugly
to always getxattr(opaque) on lookup of the last lower layer and
whether you find the overloading of opaque xattr too hacky?

As a precedent, we *always* check metacopy xattr in last lower layer
to check for error conditions, even if user did not opt-in to metacopy
at all, while we could just as easily have ignored it.

>
> My preferred alternative would be a mount option.  Amir, Alex, would
> you both be okay with that?
>

I think I had suggested that escaped private xattrs would also require
an opt-in mount option, but Alex explained that the users mounting the
overlay are not always aware of the fact that the layers were composed
this way, but I admit that I do not remember all the exact details.

Alex, do I remember correctly that the overlay instance where xwhiteouts
needs to be checked does NOT necessarily have a lowerdata layers?
The composefs instance with lowerdata layers is the one exposing the
(escaped) xwhiteout entries as xwhiteouts. Is that correct?

Is there even a use case for xwhiteouts NOT inside another lower
overlayfs?

If we limit the check for xwhiteouts only to nested overlayfs, then maybe
Miklos will care less about an extra getxattr on lookup?

Attached patch implements both xwhiteout and opaque checks during
lookup - we can later choose only one of them to keep.

Note that is changes the optimization to per-dentry, not per-layer,
so in the common case (no layers have xwhiteouts) xwhiteouts will not
be checked, but if xwhiteouts exist in any lower layer in the stack, then
xwhiteouts will be checked in all the layers of the merged dir (*).

(*) still need to optimize away lookup of xwhiteouts in upperdir.

Let me know what you think.

Thanks,
Amir.
From 7fa21951647ae24296cd0602f1291d00a8fa84d0 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@xxxxxxxxx>
Date: Fri, 9 Dec 2022 11:50:26 +0200
Subject: [PATCH v3] fsnotify: optimize the case of no content event watchers

Commit e43de7f0862b ("fsnotify: optimize the case of no marks of any type")
optimized the case where there are no fsnotify watchers on any of the
filesystem's objects.

It is quite common for a system to have a single local filesystem and
it is quite common for the system to have some inotify watches on some
config files or directories, so the optimization of no marks at all is
often not in effect.

Content event (i.e. access,modify) watchers on sb/mount more rare, so
optimizing the case of no sb/mount marks with content events can improve
performance for more systems, especially for performance sensitive io
workloads.

Unless a parent inode is watching, check for content events in masks of
sb/mount/inode masks early to optimize out the code in __fsnotify_parent()
and fsnotify() for fsnotify access/modify hooks.

Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
---
 fs/notify/fsnotify.c             | 26 ++++++++++++++++++++++++++
 fs/notify/mark.c                 | 26 +++++++++++++++++++++++++-
 include/linux/fs.h               |  1 +
 include/linux/fsnotify.h         | 21 ++++++++++++---------
 include/linux/fsnotify_backend.h | 13 +++++++++++++
 5 files changed, 77 insertions(+), 10 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 7974e91ffe13..b96251cdd165 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -237,6 +237,32 @@ int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 }
 EXPORT_SYMBOL_GPL(__fsnotify_parent);
 
+/* Notify sb/mount/inode/parent watchers of this path */
+int fsnotify_path(const struct path *path, __u32 mask)
+{
+	struct dentry *dentry = path->dentry;
+
+	if (!fsnotify_sb_has_watchers(dentry->d_sb))
+		return 0;
+
+	/* Optimize the likely case of sb/mount/inode not watching content */
+	if (mask & FSNOTIFY_CONTENT_EVENTS &&
+	    likely(!(dentry->d_flags & DCACHE_FSNOTIFY_PARENT_WATCHED))) {
+		__u32 marks_mask = d_inode(dentry)->i_fsnotify_mask |
+				   real_mount(path->mnt)->mnt_fsnotify_mask |
+				   dentry->d_sb->s_fsnotify_mask;
+
+		if (!(mask & marks_mask))
+			return 0;
+	}
+
+	/*
+	 * fsnotify_parent() checks the event masks of sb/mount/inode/parent.
+	 */
+	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
+}
+EXPORT_SYMBOL_GPL(fsnotify_path);
+
 static int fsnotify_handle_inode_event(struct fsnotify_group *group,
 				       struct fsnotify_mark *inode_mark,
 				       u32 mask, const void *data, int data_type,
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index d6944ff86ffa..d0e208913881 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -153,9 +153,30 @@ static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn,
 	return inode;
 }
 
+/*
+ * To avoid the performance penalty of rare case of sb/mount content event
+ * watchers in the hot io path, taint sb if such watchers are added.
+ */
+static void fsnotify_update_sb_watchers(struct fsnotify_mark_connector *conn,
+					u32 old_mask, u32 new_mask)
+{
+	struct super_block *sb = fsnotify_connector_sb(conn);
+	u32 new_watchers = new_mask & ~old_mask & FSNOTIFY_CONTENT_EVENTS;
+
+	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE || !sb || !new_watchers)
+		return;
+
+	/*
+	 * TODO: We need to take sb conn->lock to set FS_MNT_CONTENT_WATCHED
+	 * in sb->s_fsnotify_mask, but if this is a recalc of mount mark mask,
+	 * it is not sure that we have an sb connector attached yet.
+	 */
+	sb->s_iflags |= SB_I_CONTENT_WATCHED;
+}
+
 static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 {
-	u32 new_mask = 0;
+	u32 old_mask, new_mask = 0;
 	bool want_iref = false;
 	struct fsnotify_mark *mark;
 
@@ -163,6 +184,7 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 	/* We can get detached connector here when inode is getting unlinked. */
 	if (!fsnotify_valid_obj_type(conn->type))
 		return NULL;
+	old_mask = fsnotify_conn_mask(conn);
 	hlist_for_each_entry(mark, &conn->list, obj_list) {
 		if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED))
 			continue;
@@ -173,6 +195,8 @@ static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 	}
 	*fsnotify_conn_mask_p(conn) = new_mask;
 
+	fsnotify_update_sb_watchers(conn, old_mask, new_mask);
+
 	return fsnotify_update_iref(conn, want_iref);
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e6ba0cc6f2ee..dac36fe139e1 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1173,6 +1173,7 @@ extern int send_sigurg(struct fown_struct *fown);
 #define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */
 #define SB_I_RETIRED	0x00000800	/* superblock shouldn't be reused */
 #define SB_I_NOUMASK	0x00001000	/* VFS does not apply umask */
+#define SB_I_CONTENT_WATCHED 0x00002000 /* fsnotify file content monitor */
 
 /* Possible states of 'frozen' field */
 enum {
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 11e6434b8e71..2e0f47648a8b 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -17,6 +17,12 @@
 #include <linux/slab.h>
 #include <linux/bug.h>
 
+/* Are there any inode/mount/sb objects that are being watched? */
+static inline bool fsnotify_sb_has_watchers(struct super_block *sb)
+{
+	return atomic_long_read(&sb->s_fsnotify_connectors);
+}
+
 /*
  * Notify this @dir inode about a change in a child directory entry.
  * The directory entry may have turned positive or negative or its inode may
@@ -30,7 +36,7 @@ static inline int fsnotify_name(__u32 mask, const void *data, int data_type,
 				struct inode *dir, const struct qstr *name,
 				u32 cookie)
 {
-	if (atomic_long_read(&dir->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(dir->i_sb))
 		return 0;
 
 	return fsnotify(mask, data, data_type, dir, name, NULL, cookie);
@@ -44,7 +50,7 @@ static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 
 static inline void fsnotify_inode(struct inode *inode, __u32 mask)
 {
-	if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0)
+	if (!fsnotify_sb_has_watchers(inode->i_sb))
 		return;
 
 	if (S_ISDIR(inode->i_mode))
@@ -59,9 +65,6 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
 {
 	struct inode *inode = d_inode(dentry);
 
-	if (atomic_long_read(&inode->i_sb->s_fsnotify_connectors) == 0)
-		return 0;
-
 	if (S_ISDIR(inode->i_mode)) {
 		mask |= FS_ISDIR;
 
@@ -86,18 +89,18 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
  */
 static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 {
+	if (!fsnotify_sb_has_watchers(dentry->d_sb))
+		return;
+
 	fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY);
 }
 
 static inline int fsnotify_file(struct file *file, __u32 mask)
 {
-	const struct path *path;
-
 	if (file->f_mode & FMODE_NONOTIFY)
 		return 0;
 
-	path = &file->f_path;
-	return fsnotify_parent(path->dentry, mask, path, FSNOTIFY_EVENT_PATH);
+	return fsnotify_path(&file->f_path, mask);
 }
 
 /*
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 7f63be5ca0f1..4e098bdeb3ca 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -66,6 +66,11 @@
 #define FS_RENAME		0x10000000	/* File was renamed */
 #define FS_DN_MULTISHOT		0x20000000	/* dnotify multishot */
 #define FS_ISDIR		0x40000000	/* event occurred against dir */
+/*
+ * This flag is set in the object interest mask of sb to indicate that
+ * some mount mark is interested to get content events.
+ */
+#define FS_MNT_CONTENT_WATCHED	0x80000000
 
 #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
 
@@ -77,6 +82,13 @@
  */
 #define ALL_FSNOTIFY_DIRENT_EVENTS (FS_CREATE | FS_DELETE | FS_MOVE | FS_RENAME)
 
+/* Content events can be used to inspect file content */
+#define FSNOTIFY_CONTENT_PERM_EVENTS	(FS_ACCESS_PERM)
+
+/* Content events can be used to monitor file content */
+#define FSNOTIFY_CONTENT_EVENTS		(FS_ACCESS | FS_MODIFY | \
+					 FSNOTIFY_CONTENT_PERM_EVENTS)
+
 #define ALL_FSNOTIFY_PERM_EVENTS (FS_OPEN_PERM | FS_ACCESS_PERM | \
 				  FS_OPEN_EXEC_PERM)
 
@@ -541,6 +553,7 @@ struct fsnotify_mark {
 extern int fsnotify(__u32 mask, const void *data, int data_type,
 		    struct inode *dir, const struct qstr *name,
 		    struct inode *inode, u32 cookie);
+extern int fsnotify_path(const struct path *path, __u32 mask);
 extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void *data,
 			   int data_type);
 extern void __fsnotify_inode_delete(struct inode *inode);
-- 
2.34.1


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux