Re: [PATCH v3] ovl: Improving syncfs efficiency

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 12, 2018 at 9:13 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>
>> 在 2018年1月11日,下午6:07,Jan Kara <jack@xxxxxxx> 写道:
>>
>> On Thu 11-01-18 05:30:33, Amir Goldstein wrote:
>>> [CC: Jan Kara]
>>>
>>> On Thu, Jan 11, 2018 at 3:48 AM, Chengguang Xu <cgxu519@xxxxxxxxxx> wrote:
>>>> Currently syncfs(2) call on overlayfs just simply call sync_filesystem()
>>>> on upper_sb to synchronize whole dirty inodes in upper filesystem
>>>> regardless of the overlay ownership of the inode. It has obvious
>>>> shortcomings as below.
>>>>
>>>> (1) Performance
>>>> Synchronization is probably heavy in most cases, especially when upper
>>>> filesystem is not dedicated to target overlayfs.
>>>>
>>>> (2) Interference
>>>> Unplanned synchronization will probably impact IO performance of the
>>>> processes which use other overlayfs on same upper filesystem or directly
>>>> use upper filesystem.
>>>>
>>>> This patch iterates overlay inodes to only sync target dirty inodes in
>>>> upper filesystem. By doing this, It is able to reduce cost of synchronization
>>>> and will not seriously impact IO performance of irrelative processes.
>>>>
>>>> Signed-off-by: Chengguang Xu <cgxu519@xxxxxxxxxx>
>>>
>>> Looks good.
>>>
>>> Reviewed-by: Amir Goldstein <amir73il@xxxxxxxxx>
>>>
>>> Jan,
>>>
>>> Would you mind taking a quick look at this?
>>>
>>> Overlayfs inodes have no dirty mapping, only the underlying upper inodes
>>> do, so sync_inodes_sb() is not enough for overlay syncfs().
>>>
>>> This code creates the list of upper inodes to sync, syncs them and then
>>> calls the underlying upper fs sync_fs().
>>>
>>> Does this look reasonable?
>>
>> If I understand the original changelog, this is a performance motivated
>> change. As such it should have a some measurements accompanied with it to
>> justify the change. I.e., how much does this help? What is the exact setup
>> where this matters? Do users really care?
>>
>> More technical comments inline.
>>
>>>> @@ -80,6 +83,8 @@ struct ovl_inode {
>>>>
>>>>        /* synchronize copy up and more */
>>>>        struct mutex lock;
>>>> +       /* ovl inode sync list */
>>>> +       struct list_head sync_list;
>>>> };
>>
>> Additional overhead of 16-bytes per inode - not a huge deal since this is
>> overlay specific but it's something to keep in mind. Could not we reuse
>> i_io_list of the overlay inode instead?
>
> Hi Jan,
>
> Thanks for your review and comments.
> Actually I thought i_io_list in my very first version, however, it implies
> that inode is connected to proper bdi_writeback but overlay does not have,
> so it might be incompatible in some contexts(for example, evict).
> Maybe we can reuse i_wb_list instead.
>
>
>>
>>>> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
>>>> index 76440fe..947bb83 100644
>>>> --- a/fs/overlayfs/super.c
>>>> +++ b/fs/overlayfs/super.c
>> ...
>>>> +static void ovl_sync_inodes(struct super_block *sb)
>>>> +{
>>>> +       struct ovl_fs *ofs = sb->s_fs_info;
>>>> +       struct inode *inode, *i_next;
>>>> +       struct inode *upper_inode;
>>>> +       struct blk_plug plug;
>>>> +
>>>> +       struct writeback_control wbc = {
>>>> +               .sync_mode              = WB_SYNC_ALL,
>>>> +               .for_sync               = 1,
>>>> +               .range_start            = 0,
>>>> +               .range_end              = LLONG_MAX,
>>>> +               .nr_to_write            = LONG_MAX,
>>>> +       };
>>>> +
>>>> +       blk_start_plug(&plug);
>>>> +       spin_lock(&sb->s_inode_list_lock);
>>>> +       list_for_each_entry_safe(inode, i_next, &sb->s_inodes, i_sb_list) {
>>
>> Why a _safe iteration variant here? If you hope it allows you to safely
>> resume iteration after dropping sb->s_inode_list_lock, you are wrong
>> because i_next inode can be removed from the list once you drop that lock
>> as well.  You need to be more careful when iterating inodes. See how e.g.
>> fs/notify/fsnotify.c:fsnotify_unmount_inodes() plays tricks with always
>> keeping one inode reference to be able to resume iteration of the inode
>> list.
>>
>> Another issue is that this will iterate all inodes on the overlayfs list.
>> Both clean and dirty ones (well, those where underlying upper inode is
>> dirty). When there are only a few underlying dirty inodes, this may be
>> considerably slower than the original behavior. So this needs an
>> explanation in the changelog why the use case you care about is more
>> important than the use case this regresses. You might think that this will
>> only add an iteration of an in memory list, but on large machines, this
>> list may have millions of entries so it's iteration is not exactly fast.
>>
>>>> +               upper_inode = ovl_inode_upper(inode);
>>>> +               if (!upper_inode)
>>>> +                       continue;
>>>> +
>>>> +               spin_lock(&upper_inode->i_lock);
>>>> +               if (upper_inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE) ||
>>>> +                       list_empty(&upper_inode->i_io_list)) {
>>>> +                       spin_unlock(&upper_inode->i_lock);
>>>> +                       continue;
>>>> +               }
>>>> +               spin_unlock(&upper_inode->i_lock);
>>>> +
>>>> +               if (!igrab(inode))
>>>> +                       continue;
>>>> +
>>>> +               if (!igrab(upper_inode)) {
>>>> +                       iput(inode);
>>
>> iput() under a spinlock? Not good, that always needs a very good
>> explanation why this cannot be the last reference to drop. Preferably avoid
>> this altogether.
>
> Hi Amir,
>
> I reckon upper inode won't be deleted before overlay inode,so we can just
> hold reference of overlay inode here, am i right?
>

You are right that overlay inode holds a reference to upper inode, so you
don't need to grab upper_inode.
However, I think you may need to grab overlay inode reference before trying to
dereferecnce ovl_inode_upper(inode), so you are back to square one, having to
iput(inode) if (!upper_inode).
I'm afraid I don't know enough to propose a solution.
I suggest you start with Jan's suggestion to look at fsnotify_unmount_inodes().

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux