Re: [PATCH v6] ovl: Improving syncfs efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 22, 2018 at 6:00 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>
>> 在 2018年1月22日,下午8:58,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>
>> On Mon, Jan 22, 2018 at 2:05 PM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>>
>>>> 在 2018年1月22日,下午7:27,Amir Goldstein <amir73il@xxxxxxxxx> 写道:
>>>>
>>>> On Mon, Jan 22, 2018 at 11:47 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem()
>>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem
>>>>> regardless of the overlay ownership of the inode. In the use case of
>>>>> container, when multiple containers using the same underlying upper
>>>>> filesystem, it has some shortcomings as below.
>>>>>
>>>>> (1) Performance
>>>>> Synchronization is probably heavy because it actually syncs unnecessary
>>>>> inodes for target overlayfs.
>>>>>
>>>>> (2) Interference
>>>>> Unplanned synchronization will probably impact IO performance of
>>>>> unrelated container processes on the other overlayfs.
>>>>>
>>>>> This patch iterates upper inode list in overlayfs to only sync target
>>>>> dirty inodes and wait for completion. By doing this, It is able to reduce
>>>>> cost of synchronization and will not seriously impact IO performance of
>>>>> unrelated processes. In special case, when having very few dirty inodes
>>>>> and a lot of clean upper inodes in overlayfs, then iteration may waste
>>>>> time so that synchronization is slower than before, but we think the
>>>>> potential for performance improvement far surpass the potential for
>>>>> performance regression in most cases.
>>>>>
>>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx>
>>>>> ---
>>>>> Changes since v5:
>>>>> - Move sync related functions to new file sync.c.
>>>>> - Introduce OVL_EVICT_PENDING flag to avoid race conditions.
>>>>> - If ovl indoe flag is I_WILL_FREE or I_FREEING then syncfs
>>>>> will wait until OVL_EVICT_PENDING to be cleared.
>>>>> - Move inode sync operation into evict_inode from drop_inode.
>>>>> - Call upper_sb->sync_fs with no-wait after sync dirty upper
>>>>> inodes.
>>>>> - Hold s_inode_wblist_lock until deleting item from list.
>>>>> - Remove write_inode fuction because no more used.
>>>>>
[...]
>>>>
>>>>
>>>>
>>>>> +                               /* We need to do this as 'inode' can be freed by now */
>>>>
>>>> Could oi_prev have been freed or removed from upper_inodes list?
>>>> I don't see why not.
>>>> You could try to continue iteration from iput_inode if it is not NULL,
>>>> but what if it is NULL?
>>>
>>> When iterating the list we either get reference or wait until the item gets deleted,
>>> so I think there is no chance let previous inode becomes NULL.
>>>
>>
>> Maybe I am missing something.
>> When iterating the upper_inodes list you get a reference on current (oi)
>> and maybe have a reference on previous via iput_inode.
>> previous may have a reference from wb_list, but that reference can be lost
>> while we are not holding the wblist lock. No?
>
> The reference of previous inode can be lost if the inode is not our target,
> but it must be happened after iput(iput_inode), before this will be safe.
>

I was talking about the case where iput_inode is NULL.

> When deleting item from list we take lock to protect from concurrent operations,
> so if we have already got list lock, it will be safe to touch items in the list.
>

But we have let go of the list lock.

>>
>>>
>>>> Maybe instead of iput_inode = NULL when adding to inodes_wb list
>>>> get another reference to inode and always store iput_inode = inode
>>>> to use as prev in this case?
>>>
>>> Maybe it’s better than current, but I think it’s more complex to understand
>>> the behavior, so please let me wait for more comments for the approach.
>>>
>>
>> Sure, let's wait for other comments, but I can't say I understand what is
>> complex to understand about change that I propose. If anything it is simpler
>> to understand, especially if you rename iput_inode to prev_inode:
>>
>> +               /*
>> +                * If ovl inode is not in sb->s_inodes_wb list, add to the
>> +                * list and increment inode reference until IO to be
>> completed on
>> +                * the inode.
>> +                */
>> +               spin_lock(&sb->s_inode_wblist_lock);
>> +               if (list_empty(&inode->i_wb_list)) {
>> +                       list_add_tail(&inode->i_wb_list, &sb->s_inodes_wb);
>> +                       ihold(inode);
>> +               }
>> +               iput_inode = inode;
>
>
> Actually, I forgot to compare whether the previous item is the list head or not,
> so I think there is a problem when deleting very first item in the list,
> even we use iput_inode here, we can’t get rid of list operation because of that.
> if you think the code should be more readable, how about modify like below?
>
> 1. take sb->s_sync_lock mutex lock
> 2. splice to local temp list
> 3. get first entry every time to operate
> 4. put into sb->s_inode_wblist again one by one
>
> I think there is no much concurrent syncfs running at the same time.

This could work. Need to see it.

>
>
>>
>>>
>>>>
>>>> It would be good if you wrote the "life cycle" of ovl_inode in a
>>>> header comment to
>>>> this file because it has become non-trivial to follow.
>>>> An explicit mention of locking order is also very good to have in
>>>> header comment.
>>>>
>>>>
>>>
>>> Let me add in next version.
>>>
>>> Talking about life cycle of ovl_inode, currently I only mark OVL_EVICT_EXPENDING
>>> for ovl_inode which has upper inode, is it acceptable?
>>>
>>
>> Until your change, ovl_inode life cycle was 'generic' when refcount
>> dropped to 0.
>> Now, inode can be in OVL_EVICT_PENDING state, it can be waiting for sync
>> on or off upper_inodes list. I'm just saying it would be nice to
>> document all the
>> possible states of an ovl_inode during eviction and document all the locking
>> and refcount guaranties in one place to see that we did not miss anything.
>>
>> The volume of this documentation is entirely up to you.
>> I think it will be good for you as well to tell the whole story from
>> start to end
>> and verify there are no holes in the plot.
>
>
> I did not mean only write above asking to the document, I’m just asking option for OVL_EVICT_PEDING flag.

If you meant the flag name or using a flag in general, that seems good to me.

> Currently the flag cycle of ovl indoe split into different cycles by having upper inode or not.
>
> 1. having upper inode
> (allocation) -> new -> none -> OVL_EVICT_PENDING|I_FREEING -> I_FREEING|I_CLEAR -> (destruction)
>

That's good, if there are other conditions that guaranty that state
transitions are not racy
like list locks concurrent sync lock and elevated reflink you should
mention them.

Cheers,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux