On Fri, Jan 12, 2018 at 9:13 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >> >> 在 2018年1月11日,下午6:07,Jan Kara <jack@xxxxxxx> 写道: >> >> On Thu 11-01-18 05:30:33, Amir Goldstein wrote: >>> [CC: Jan Kara] >>> >>> On Thu, Jan 11, 2018 at 3:48 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem() >>>> on upper_sb to synchronize whole dirty inodes in upper filesystem >>>> regardless of the overlay ownership of the inode. It has obvious >>>> shortcomings as below. >>>> >>>> (1) Performance >>>> Synchronization is probably heavy in most cases, especially when upper >>>> filesystem is not dedicated to target overlayfs. >>>> >>>> (2) Interference >>>> Unplanned synchronization will probably impact IO performance of the >>>> processes which use other overlayfs on same upper filesystem or directly >>>> use upper filesystem. >>>> >>>> This patch iterates overlay inodes to only sync target dirty inodes in >>>> upper filesystem. By doing this, It is able to reduce cost of synchronization >>>> and will not seriously impact IO performance of irrelative processes. >>>> >>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> >>> >>> Looks good. >>> >>> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx> >>> >>> Jan, >>> >>> Would you mind taking a quick look at this? >>> >>> Overlayfs inodes have no dirty mapping, only the underlying upper inodes >>> do, so sync_inodes_sb() is not enough for overlay syncfs(). >>> >>> This code creates the list of upper inodes to sync, syncs them and then >>> calls the underlying upper fs sync_fs(). >>> >>> Does this look reasonable? >> >> If I understand the original changelog, this is a performance motivated >> change. As such it should have a some measurements accompanied with it to >> justify the change. I.e., how much does this help? What is the exact setup >> where this matters? Do users really care? >> >> More technical comments inline. >> >>>> @@ -80,6 +83,8 @@ struct ovl_inode { >>>> >>>> /* synchronize copy up and more */ >>>> struct mutex lock; >>>> + /* ovl inode sync list */ >>>> + struct list_head sync_list; >>>> }; >> >> Additional overhead of 16-bytes per inode - not a huge deal since this is >> overlay specific but it's something to keep in mind. Could not we reuse >> i_io_list of the overlay inode instead? > > Hi Jan, > > Thanks for your review and comments. > Actually I thought i_io_list in my very first version, however, it implies > that inode is connected to proper bdi_writeback but overlay does not have, > so it might be incompatible in some contexts(for example, evict). > Maybe we can reuse i_wb_list instead. > > >> >>>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c >>>> index 76440fe..947bb83 100644 >>>> --- a/fs/overlayfs/super.c >>>> +++ b/fs/overlayfs/super.c >> ... >>>> +static void ovl_sync_inodes(struct super_block *sb) >>>> +{ >>>> + struct ovl_fs *ofs = sb->s_fs_info; >>>> + struct inode *inode, *i_next; >>>> + struct inode *upper_inode; >>>> + struct blk_plug plug; >>>> + >>>> + struct writeback_control wbc = { >>>> + .sync_mode = WB_SYNC_ALL, >>>> + .for_sync = 1, >>>> + .range_start = 0, >>>> + .range_end = LLONG_MAX, >>>> + .nr_to_write = LONG_MAX, >>>> + }; >>>> + >>>> + blk_start_plug(&plug); >>>> + spin_lock(&sb->s_inode_list_lock); >>>> + list_for_each_entry_safe(inode, i_next, &sb->s_inodes, i_sb_list) { >> >> Why a _safe iteration variant here? If you hope it allows you to safely >> resume iteration after dropping sb->s_inode_list_lock, you are wrong >> because i_next inode can be removed from the list once you drop that lock >> as well. You need to be more careful when iterating inodes. See how e.g. >> fs/notify/fsnotify.c:fsnotify_unmount_inodes() plays tricks with always >> keeping one inode reference to be able to resume iteration of the inode >> list. >> >> Another issue is that this will iterate all inodes on the overlayfs list. >> Both clean and dirty ones (well, those where underlying upper inode is >> dirty). When there are only a few underlying dirty inodes, this may be >> considerably slower than the original behavior. So this needs an >> explanation in the changelog why the use case you care about is more >> important than the use case this regresses. You might think that this will >> only add an iteration of an in memory list, but on large machines, this >> list may have millions of entries so it's iteration is not exactly fast. >> >>>> + upper_inode = ovl_inode_upper(inode); >>>> + if (!upper_inode) >>>> + continue; >>>> + >>>> + spin_lock(&upper_inode->i_lock); >>>> + if (upper_inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE) || >>>> + list_empty(&upper_inode->i_io_list)) { >>>> + spin_unlock(&upper_inode->i_lock); >>>> + continue; >>>> + } >>>> + spin_unlock(&upper_inode->i_lock); >>>> + >>>> + if (!igrab(inode)) >>>> + continue; >>>> + >>>> + if (!igrab(upper_inode)) { >>>> + iput(inode); >> >> iput() under a spinlock? Not good, that always needs a very good >> explanation why this cannot be the last reference to drop. Preferably avoid >> this altogether. > > Hi Amir, > > I reckon upper inode won't be deleted before overlay inode,so we can just > hold reference of overlay inode here, am i right? > You are right that overlay inode holds a reference to upper inode, so you don't need to grab upper_inode. However, I think you may need to grab overlay inode reference before trying to dereferecnce ovl_inode_upper(inode), so you are back to square one, having to iput(inode) if (!upper_inode). I'm afraid I don't know enough to propose a solution. I suggest you start with Jan's suggestion to look at fsnotify_unmount_inodes(). Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html