Re: [PATCH v6] ovl: Improving syncfs efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> 
> 在 2018年1月23日,下午2:39,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
> 
> On Tue, Jan 23, 2018 at 2:24 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>> 
>>> 在 2018年1月23日,上午2:45,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>> 
>>> On Mon, Jan 22, 2018 at 6:00 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>> 
>>>>> 在 2018年1月22日,下午8:58,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>>>> 
>>>>> On Mon, Jan 22, 2018 at 2:05 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> 在 2018年1月22日,下午7:27,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>>>>>> 
>>>>>>> On Mon, Jan 22, 2018 at 11:47 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>>>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem()
>>>>>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem
>>>>>>>> regardless of the overlay ownership of the inode. In the use case of
>>>>>>>> container, when multiple containers using the same underlying upper
>>>>>>>> filesystem, it has some shortcomings as below.
>>>>>>>> 
>>>>>>>> (1) Performance
>>>>>>>> Synchronization is probably heavy because it actually syncs unnecessary
>>>>>>>> inodes for target overlayfs.
>>>>>>>> 
>>>>>>>> (2) Interference
>>>>>>>> Unplanned synchronization will probably impact IO performance of
>>>>>>>> unrelated container processes on the other overlayfs.
>>>>>>>> 
>>>>>>>> This patch iterates upper inode list in overlayfs to only sync target
>>>>>>>> dirty inodes and wait for completion. By doing this, It is able to reduce
>>>>>>>> cost of synchronization and will not seriously impact IO performance of
>>>>>>>> unrelated processes. In special case, when having very few dirty inodes
>>>>>>>> and a lot of clean upper inodes in overlayfs, then iteration may waste
>>>>>>>> time so that synchronization is slower than before, but we think the
>>>>>>>> potential for performance improvement far surpass the potential for
>>>>>>>> performance regression in most cases.
>>>>>>>> 
>>>>>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx>
>>>>>>>> ---
>>>>>>>> Changes since v5:
>>>>>>>> - Move sync related functions to new file sync.c.
>>>>>>>> - Introduce OVL_EVICT_PENDING flag to avoid race conditions.
>>>>>>>> - If ovl indoe flag is I_WILL_FREE or I_FREEING then syncfs
>>>>>>>> will wait until OVL_EVICT_PENDING to be cleared.
>>>>>>>> - Move inode sync operation into evict_inode from drop_inode.
>>>>>>>> - Call upper_sb->sync_fs with no-wait after sync dirty upper
>>>>>>>> inodes.
>>>>>>>> - Hold s_inode_wblist_lock until deleting item from list.
>>>>>>>> - Remove write_inode fuction because no more used.
>>>>>>>> 
>>> [...]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> +                               /* We need to do this as 'inode' can be freed by now */
>>>>>>> 
>>>>>>> Could oi_prev have been freed or removed from upper_inodes list?
>>>>>>> I don't see why not.
>>>>>>> You could try to continue iteration from iput_inode if it is not NULL,
>>>>>>> but what if it is NULL?
>>>>>> 
>>>>>> When iterating the list we either get reference or wait until the item gets deleted,
>>>>>> so I think there is no chance let previous inode becomes NULL.
>>>>>> 
>>>>> 
>>>>> Maybe I am missing something.
>>>>> When iterating the upper_inodes list you get a reference on current (oi)
>>>>> and maybe have a reference on previous via iput_inode.
>>>>> previous may have a reference from wb_list, but that reference can be lost
>>>>> while we are not holding the wblist lock. No?
>>>> 
>>>> The reference of previous inode can be lost if the inode is not our target,
>>>> but it must be happened after iput(iput_inode), before this will be safe.
>>>> 
>>> 
>>> I was talking about the case where iput_inode is NULL.
>> 
>> If iput_inode is NULL previous inode will not put reference until finishing wait,
>> so this case is safer than iput_inode having value. Meanwhile if inode has reference,
>> it won’t get deleted from list.
>> 
> 
> Everything you write may be true, but when writing "list" and "lock" and
> not specifying which "list" and which "lock" the explanation is not clear.
> 
> If inode has reference it won't get deleted from upper_inodes list,
> but it can be deleted from  s_inodes_wb list once wait for io is completed
> and then loose the last reference and get deleted from upper_inodes list.
> 
> As far as I can tell, ovl_sync_inodes() and ovl_wait_inodes() are not
> synchronized by any exclusive lock, so when you let go of all the
> ovl_wait_inodes() locks for finish_wait(), you have no guaranty on any
> of the inodes on upper_inodes anymore, and specifically no guaranty
> that oi_prev is alive.

I’m guessing you must be worrying about one syncfs(A) is waiting the IO to
be completed on inode 100, at the same time another syncfs(B) is iterating
inode 200 which is next of inode 100 in the upper_inodes list. Right?

In this case, second syncfs(B) will get another refcount of inode 100 in
ovl_sync_inodes() so even ater decreasing refcount of inode 100 in 
ovl_wait_inodes() for syncfs(A), inode 100 is still safe to use in
the ovl_sync_inodes() for syncfs(B). 


> 
> As far as I can tell, your suggestion to synchronize both ovl_sync_inodes()
> and ovl_wait_inodes() with sb->s_sync_lock will take care of that problem.
> Since in current implementation both functions are called with wait == 1
> anyway, I don't see any problem with synchronizing the two functions. do you?
> 

I did some basic tests(heavy untar && concurrent syncfs) to verify my proposal, 
and up to now seems there is no problem, but I want to say my proposal is
not mainly for addressing synchronizing problem between ovl_sync_inodes()
and ovl_wait_inodes(), I just want the code looks simple and easy to understand.


> In any case, if you do agree with my analysis and add sb->s_sync_lock
> to ovl_wait_inodes(), please document the reasons that make the upper_inode
> in ovl_wait_inodes() iteration safe, because I don't think they are as
> trivial as
> you make them sound.

I think if we modify to my proposal, then the code will be easy enough to understand,
so there will be no much doubt for safety of list iterating.


Thanks,
Chengguang.


--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux