> > 在 2018年1月23日,上午2:45,Amir Goldstein <amir73il@xxxxxxxxx> 写道: > > On Mon, Jan 22, 2018 at 6:00 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >> >>> 在 2018年1月22日,下午8:58,Amir Goldstein <amir73il@xxxxxxxxx> 写道: >>> >>> On Mon, Jan 22, 2018 at 2:05 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >>>>> >>>>> 在 2018年1月22日,下午7:27,Amir Goldstein <amir73il@xxxxxxxxx> 写道: >>>>> >>>>> On Mon, Jan 22, 2018 at 11:47 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >>>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem() >>>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem >>>>>> regardless of the overlay ownership of the inode. In the use case of >>>>>> container, when multiple containers using the same underlying upper >>>>>> filesystem, it has some shortcomings as below. >>>>>> >>>>>> (1) Performance >>>>>> Synchronization is probably heavy because it actually syncs unnecessary >>>>>> inodes for target overlayfs. >>>>>> >>>>>> (2) Interference >>>>>> Unplanned synchronization will probably impact IO performance of >>>>>> unrelated container processes on the other overlayfs. >>>>>> >>>>>> This patch iterates upper inode list in overlayfs to only sync target >>>>>> dirty inodes and wait for completion. By doing this, It is able to reduce >>>>>> cost of synchronization and will not seriously impact IO performance of >>>>>> unrelated processes. In special case, when having very few dirty inodes >>>>>> and a lot of clean upper inodes in overlayfs, then iteration may waste >>>>>> time so that synchronization is slower than before, but we think the >>>>>> potential for performance improvement far surpass the potential for >>>>>> performance regression in most cases. >>>>>> >>>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> >>>>>> --- >>>>>> Changes since v5: >>>>>> - Move sync related functions to new file sync.c. >>>>>> - Introduce OVL_EVICT_PENDING flag to avoid race conditions. >>>>>> - If ovl indoe flag is I_WILL_FREE or I_FREEING then syncfs >>>>>> will wait until OVL_EVICT_PENDING to be cleared. >>>>>> - Move inode sync operation into evict_inode from drop_inode. >>>>>> - Call upper_sb->sync_fs with no-wait after sync dirty upper >>>>>> inodes. >>>>>> - Hold s_inode_wblist_lock until deleting item from list. >>>>>> - Remove write_inode fuction because no more used. >>>>>> > [...] >>>>> >>>>> >>>>> >>>>>> + /* We need to do this as 'inode' can be freed by now */ >>>>> >>>>> Could oi_prev have been freed or removed from upper_inodes list? >>>>> I don't see why not. >>>>> You could try to continue iteration from iput_inode if it is not NULL, >>>>> but what if it is NULL? >>>> >>>> When iterating the list we either get reference or wait until the item gets deleted, >>>> so I think there is no chance let previous inode becomes NULL. >>>> >>> >>> Maybe I am missing something. >>> When iterating the upper_inodes list you get a reference on current (oi) >>> and maybe have a reference on previous via iput_inode. >>> previous may have a reference from wb_list, but that reference can be lost >>> while we are not holding the wblist lock. No? >> >> The reference of previous inode can be lost if the inode is not our target, >> but it must be happened after iput(iput_inode), before this will be safe. >> > > I was talking about the case where iput_inode is NULL. If iput_inode is NULL previous inode will not put reference until finishing wait, so this case is safer than iput_inode having value. Meanwhile if inode has reference, it won’t get deleted from list. > >> When deleting item from list we take lock to protect from concurrent operations, >> so if we have already got list lock, it will be safe to touch items in the list. >> > > But we have let go of the list lock. I don’t know what do you worry about, could you be more specific? Thanks, Chengguang. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html