Re: [PATCH v6] ovl: Improving syncfs efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 23, 2018 at 2:24 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>
>> 在 2018年1月23日,上午2:45,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>
>> On Mon, Jan 22, 2018 at 6:00 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>
>>>> 在 2018年1月22日,下午8:58,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>>>
>>>> On Mon, Jan 22, 2018 at 2:05 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> 在 2018年1月22日,下午7:27,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>>>>>
>>>>>> On Mon, Jan 22, 2018 at 11:47 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem()
>>>>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem
>>>>>>> regardless of the overlay ownership of the inode. In the use case of
>>>>>>> container, when multiple containers using the same underlying upper
>>>>>>> filesystem, it has some shortcomings as below.
>>>>>>>
>>>>>>> (1) Performance
>>>>>>> Synchronization is probably heavy because it actually syncs unnecessary
>>>>>>> inodes for target overlayfs.
>>>>>>>
>>>>>>> (2) Interference
>>>>>>> Unplanned synchronization will probably impact IO performance of
>>>>>>> unrelated container processes on the other overlayfs.
>>>>>>>
>>>>>>> This patch iterates upper inode list in overlayfs to only sync target
>>>>>>> dirty inodes and wait for completion. By doing this, It is able to reduce
>>>>>>> cost of synchronization and will not seriously impact IO performance of
>>>>>>> unrelated processes. In special case, when having very few dirty inodes
>>>>>>> and a lot of clean upper inodes in overlayfs, then iteration may waste
>>>>>>> time so that synchronization is slower than before, but we think the
>>>>>>> potential for performance improvement far surpass the potential for
>>>>>>> performance regression in most cases.
>>>>>>>
>>>>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> Changes since v5:
>>>>>>> - Move sync related functions to new file sync.c.
>>>>>>> - Introduce OVL_EVICT_PENDING flag to avoid race conditions.
>>>>>>> - If ovl indoe flag is I_WILL_FREE or I_FREEING then syncfs
>>>>>>> will wait until OVL_EVICT_PENDING to be cleared.
>>>>>>> - Move inode sync operation into evict_inode from drop_inode.
>>>>>>> - Call upper_sb->sync_fs with no-wait after sync dirty upper
>>>>>>> inodes.
>>>>>>> - Hold s_inode_wblist_lock until deleting item from list.
>>>>>>> - Remove write_inode fuction because no more used.
>>>>>>>
>> [...]
>>>>>>
>>>>>>
>>>>>>
>>>>>>> +                               /* We need to do this as 'inode' can be freed by now */
>>>>>>
>>>>>> Could oi_prev have been freed or removed from upper_inodes list?
>>>>>> I don't see why not.
>>>>>> You could try to continue iteration from iput_inode if it is not NULL,
>>>>>> but what if it is NULL?
>>>>>
>>>>> When iterating the list we either get reference or wait until the item gets deleted,
>>>>> so I think there is no chance let previous inode becomes NULL.
>>>>>
>>>>
>>>> Maybe I am missing something.
>>>> When iterating the upper_inodes list you get a reference on current (oi)
>>>> and maybe have a reference on previous via iput_inode.
>>>> previous may have a reference from wb_list, but that reference can be lost
>>>> while we are not holding the wblist lock. No?
>>>
>>> The reference of previous inode can be lost if the inode is not our target,
>>> but it must be happened after iput(iput_inode), before this will be safe.
>>>
>>
>> I was talking about the case where iput_inode is NULL.
>
> If iput_inode is NULL previous inode will not put reference until finishing wait,
> so this case is safer than iput_inode having value. Meanwhile if inode has reference,
> it won’t get deleted from list.
>

Everything you write may be true, but when writing "list" and "lock" and
not specifying which "list" and which "lock" the explanation is not clear.

If inode has reference it won't get deleted from upper_inodes list,
but it can be deleted from  s_inodes_wb list once wait for io is completed
and then loose the last reference and get deleted from upper_inodes list.

As far as I can tell, ovl_sync_inodes() and ovl_wait_inodes() are not
synchronized by any exclusive lock, so when you let go of all the
ovl_wait_inodes() locks for finish_wait(), you have no guaranty on any
of the inodes on upper_inodes anymore, and specifically no guaranty
that oi_prev is alive.

As far as I can tell, your suggestion to synchronize both ovl_sync_inodes()
and ovl_wait_inodes() with sb->s_sync_lock will take care of that problem.
Since in current implementation both functions are called with wait == 1
anyway, I don't see any problem with synchronizing the two functions. do you?

In any case, if you do agree with my analysis and add sb->s_sync_lock
to ovl_wait_inodes(), please document the reasons that make the upper_inode
in ovl_wait_inodes() iteration safe, because I don't think they are as
trivial as
you make them sound.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux