On Thu 11-01-18 05:30:33, Amir Goldstein wrote: > [CC: Jan Kara] > > On Thu, Jan 11, 2018 at 3:48 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: > > Currently syncfs(2) call on overlayfs just simply call sync_filesystem() > > on upper_sb to synchronize whole dirty inodes in upper filesystem > > regardless of the overlay ownership of the inode. It has obvious > > shortcomings as below. > > > > (1) Performance > > Synchronization is probably heavy in most cases, especially when upper > > filesystem is not dedicated to target overlayfs. > > > > (2) Interference > > Unplanned synchronization will probably impact IO performance of the > > processes which use other overlayfs on same upper filesystem or directly > > use upper filesystem. > > > > This patch iterates overlay inodes to only sync target dirty inodes in > > upper filesystem. By doing this, It is able to reduce cost of synchronization > > and will not seriously impact IO performance of irrelative processes. > > > > Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> > > Looks good. > > Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> > > Jan, > > Would you mind taking a quick look at this? > > Overlayfs inodes have no dirty mapping, only the underlying upper inodes > do, so sync_inodes_sb() is not enough for overlay syncfs(). > > This code creates the list of upper inodes to sync, syncs them and then > calls the underlying upper fs sync_fs(). > > Does this look reasonable? If I understand the original changelog, this is a performance motivated change. As such it should have a some measurements accompanied with it to justify the change. I.e., how much does this help? What is the exact setup where this matters? Do users really care? More technical comments inline. > > @@ -80,6 +83,8 @@ struct ovl_inode { > > > > /* synchronize copy up and more */ > > struct mutex lock; > > + /* ovl inode sync list */ > > + struct list_head sync_list; > > }; Additional overhead of 16-bytes per inode - not a huge deal since this is overlay specific but it's something to keep in mind. Could not we reuse i_io_list of the overlay inode instead? > > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > > index 76440fe..947bb83 100644 > > --- a/fs/overlayfs/super.c > > +++ b/fs/overlayfs/super.c ... > > +static void ovl_sync_inodes(struct super_block *sb) > > +{ > > + struct ovl_fs *ofs = sb->s_fs_info; > > + struct inode *inode, *i_next; > > + struct inode *upper_inode; > > + struct blk_plug plug; > > + > > + struct writeback_control wbc = { > > + .sync_mode = WB_SYNC_ALL, > > + .for_sync = 1, > > + .range_start = 0, > > + .range_end = LLONG_MAX, > > + .nr_to_write = LONG_MAX, > > + }; > > + > > + blk_start_plug(&plug); > > + spin_lock(&sb->s_inode_list_lock); > > + list_for_each_entry_safe(inode, i_next, &sb->s_inodes, i_sb_list) { Why a _safe iteration variant here? If you hope it allows you to safely resume iteration after dropping sb->s_inode_list_lock, you are wrong because i_next inode can be removed from the list once you drop that lock as well. You need to be more careful when iterating inodes. See how e.g. fs/notify/fsnotify.c:fsnotify_unmount_inodes() plays tricks with always keeping one inode reference to be able to resume iteration of the inode list. Another issue is that this will iterate all inodes on the overlayfs list. Both clean and dirty ones (well, those where underlying upper inode is dirty). When there are only a few underlying dirty inodes, this may be considerably slower than the original behavior. So this needs an explanation in the changelog why the use case you care about is more important than the use case this regresses. You might think that this will only add an iteration of an in memory list, but on large machines, this list may have millions of entries so it's iteration is not exactly fast. > > + upper_inode = ovl_inode_upper(inode); > > + if (!upper_inode) > > + continue; > > + > > + spin_lock(&upper_inode->i_lock); > > + if (upper_inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE) || > > + list_empty(&upper_inode->i_io_list)) { > > + spin_unlock(&upper_inode->i_lock); > > + continue; > > + } > > + spin_unlock(&upper_inode->i_lock); > > + > > + if (!igrab(inode)) > > + continue; > > + > > + if (!igrab(upper_inode)) { > > + iput(inode); iput() under a spinlock? Not good, that always needs a very good explanation why this cannot be the last reference to drop. Preferably avoid this altogether. > > + continue; > > + } > > + spin_unlock(&sb->s_inode_list_lock); > > + > > + sync_inode(upper_inode, &wbc); > > + spin_lock(&ofs->ovl_sync_list_lock); > > + if (list_empty(&OVL_I(inode)->sync_list)) > > + list_add(&OVL_I(inode)->sync_list, &ofs->ovl_sync_list); > > + else { > > + iput(upper_inode); > > + iput(inode); Again iput() under spinlock... > > + } > > + spin_unlock(&ofs->ovl_sync_list_lock); > > + > > + if (need_resched()) { > > + blk_finish_plug(&plug); > > + cond_resched(); > > + blk_start_plug(&plug); > > + } > > + spin_lock(&sb->s_inode_list_lock); > > + } > > + spin_unlock(&sb->s_inode_list_lock); > > + blk_finish_plug(&plug); > > +} Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html