On Mon, Jan 15, 2018 at 3:41 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >> >> 在 2018年1月15日,下午6:19,Miklos Szeredi <miklos@xxxxxxxxxx> 写道: >> >> On Mon, Jan 15, 2018 at 11:10 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: >>> On Mon, Jan 15, 2018 at 11:29 AM, Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>>> On Mon, Jan 15, 2018 at 8:33 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote: >>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem() >>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem >>>>> regardless of the overlay ownership of the inode. In the use case of >>>>> container, when multiple containers using the same underlying upper >>>>> filesystem, it has some shortcomings as below. >>>>> >>>>> (1) Performance >>>>> Synchronization is probably heavy because it actually syncs unnecessary >>>>> inodes for target overlayfs. >>>>> >>>>> (2) Interference >>>>> Unplanned synchronization will probably impact IO performance of >>>>> irrelative container processes on the other overlayfs. >>>>> >>>>> This patch iterates upper inode list in overlayfs to only sync target >>>>> dirty inodes and wait for completion. By doing this, It is able to reduce >>>>> cost of synchronization and will not seriously impact IO performance of >>>>> irrelative processes. In special case, when having very few dirty inodes >>>>> and a lot of clean upper inodes in overlayfs, then iteration may waste >>>>> time so that synchronization is slower than before, but we think the >>>>> potential for performance improvement far surpass the potential for >>>>> performance regression in most cases. >>>>> >>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx> >>>>> --- >>>>> Changes since v3: >>>>> - Introduce upper_inode list to organize ovl indoe which has upper inode. >>>> >>>> Where is the inode removed from the list? Should be done in >>>> ovl_destroy_inode(). >>>> >>>> Problem is: we can drop the overlay inode even when the underlying >>>> inode is dirty, so we need to sync the upper inode before letting it >>>> go in ovl_destroy_inode(). >>>> >>> >>> Isn't that a bit counter to the lazy inode writeback approach? >> >> Not really. The dcache still keeps inodes and cache in memory, until >> there's memory pressure to get rid of the dentry itself. >> >> So this is about syncing the upper inode when the overlay dentry is >> flushed out of the dcache, which should not interfere with lazy >> writeback in general. >> >>> Won't this cause a change of behavior, e.g. by forcing data sync before >>> returning from close of file after a large write? >> >> No, file keeps a ref on dentry, which keeps a ref on inode + >> underlying dentries. >> > > Hi Miklos, Amir > > Thanks for your comments for the patch. > > If the speed of ovl inode destruction is not serious problem, I would like to > implement like below, what do you think? > > In the process of ovl inode destruction: > (I will introduce evict operation and do below oprations there) > > 1. start syncing inode. > 2. add current inode to sb->s_inodes_wb list > 3. take sb->s_sync_lock and wait until all inode in the wait list finishing syncing. > #Of course, we can only wait for specific inode, but is that really helping much? > > Maybe in the worst case, inode destruction will take time for the first inode, > but I think it is not common for all inodes. The purpose of adding inode to > sb->s_inodes_wb list is for data consistency of sync result, because I’m afraid that > inode destruction will let dirty inode escaping from sync waiting list. > I don't see a flaw in this plan, but here is another alternative plan to consider: Implement drop_inode() operation (not evict_inode) as follows: if inode->i_stat & I_WILL_FREE: return 1 /* ok to evict */ else if upper is dirty: increment i_refcount inode->i_stat |= I_WILL_FREE add to sb->s_inodes_wb list return 0 /* don't evict */ else: return 1 /* ok to evict */ If I am not wrong, this will let inode live a little longer with a single refcount on sb->s_inodes_wb list until a later umount or syncfs syncs the sb->s_inodes_wb list and drops the last refcount. This way on memory pressure, when many overlay inodes are evicted, we do not sync and wait on them one at a time, but like most fs, we sync and then wait on all of them on sync_filesystem() call. Hope I am not sending you on a dangerous adventure, so maybe consider the simple and slower implementation before trying this optimization. Thanks, Amir. -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html