> > 在 2018年1月11日,下午6:07,Jan Kara <jack@xxxxxxx> 写道: > > On Thu 11-01-18 05:30:33, Amir Goldstein wrote: >> [CC: Jan Kara] >> >> On Thu, Jan 11, 2018 at 3:48 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem() >>> on upper_sb to synchronize whole dirty inodes in upper filesystem >>> regardless of the overlay ownership of the inode. It has obvious >>> shortcomings as below. >>> >>> (1) Performance >>> Synchronization is probably heavy in most cases, especially when upper >>> filesystem is not dedicated to target overlayfs. >>> >>> (2) Interference >>> Unplanned synchronization will probably impact IO performance of the >>> processes which use other overlayfs on same upper filesystem or directly >>> use upper filesystem. >>> >>> This patch iterates overlay inodes to only sync target dirty inodes in >>> upper filesystem. By doing this, It is able to reduce cost of synchronization >>> and will not seriously impact IO performance of irrelative processes. >>> >>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> >> >> Looks good. >> >> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> >> >> Jan, >> >> Would you mind taking a quick look at this? >> >> Overlayfs inodes have no dirty mapping, only the underlying upper inodes >> do, so sync_inodes_sb() is not enough for overlay syncfs(). >> >> This code creates the list of upper inodes to sync, syncs them and then >> calls the underlying upper fs sync_fs(). >> >> Does this look reasonable? > > If I understand the original changelog, this is a performance motivated > change. As such it should have a some measurements accompanied with it to > justify the change. I.e., how much does this help? What is the exact setup > where this matters? Do users really care? > > More technical comments inline. > >>> @@ -80,6 +83,8 @@ struct ovl_inode { >>> >>> /* synchronize copy up and more */ >>> struct mutex lock; >>> + /* ovl inode sync list */ >>> + struct list_head sync_list; >>> }; > > Additional overhead of 16-bytes per inode - not a huge deal since this is > overlay specific but it's something to keep in mind. Could not we reuse > i_io_list of the overlay inode instead? Hi Jan, Thanks for your review and comments. Actually I thought i_io_list in my very first version, however, it implies that inode is connected to proper bdi_writeback but overlay does not have, so it might be incompatible in some contexts(for example, evict). Maybe we can reuse i_wb_list instead. > >>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c >>> index 76440fe..947bb83 100644 >>> --- a/fs/overlayfs/super.c >>> +++ b/fs/overlayfs/super.c > ... >>> +static void ovl_sync_inodes(struct super_block *sb) >>> +{ >>> + struct ovl_fs *ofs = sb->s_fs_info; >>> + struct inode *inode, *i_next; >>> + struct inode *upper_inode; >>> + struct blk_plug plug; >>> + >>> + struct writeback_control wbc = { >>> + .sync_mode = WB_SYNC_ALL, >>> + .for_sync = 1, >>> + .range_start = 0, >>> + .range_end = LLONG_MAX, >>> + .nr_to_write = LONG_MAX, >>> + }; >>> + >>> + blk_start_plug(&plug); >>> + spin_lock(&sb->s_inode_list_lock); >>> + list_for_each_entry_safe(inode, i_next, &sb->s_inodes, i_sb_list) { > > Why a _safe iteration variant here? If you hope it allows you to safely > resume iteration after dropping sb->s_inode_list_lock, you are wrong > because i_next inode can be removed from the list once you drop that lock > as well. You need to be more careful when iterating inodes. See how e.g. > fs/notify/fsnotify.c:fsnotify_unmount_inodes() plays tricks with always > keeping one inode reference to be able to resume iteration of the inode > list. > > Another issue is that this will iterate all inodes on the overlayfs list. > Both clean and dirty ones (well, those where underlying upper inode is > dirty). When there are only a few underlying dirty inodes, this may be > considerably slower than the original behavior. So this needs an > explanation in the changelog why the use case you care about is more > important than the use case this regresses. You might think that this will > only add an iteration of an in memory list, but on large machines, this > list may have millions of entries so it's iteration is not exactly fast. > >>> + upper_inode = ovl_inode_upper(inode); >>> + if (!upper_inode) >>> + continue; >>> + >>> + spin_lock(&upper_inode->i_lock); >>> + if (upper_inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE) || >>> + list_empty(&upper_inode->i_io_list)) { >>> + spin_unlock(&upper_inode->i_lock); >>> + continue; >>> + } >>> + spin_unlock(&upper_inode->i_lock); >>> + >>> + if (!igrab(inode)) >>> + continue; >>> + >>> + if (!igrab(upper_inode)) { >>> + iput(inode); > > iput() under a spinlock? Not good, that always needs a very good > explanation why this cannot be the last reference to drop. Preferably avoid > this altogether. Hi Amir, I reckon upper inode won't be deleted before overlay inode,so we can just hold reference of overlay inode here, am i right? Thanks, Chengguang. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html