Re: [PATCH 5/5] vfs: Avoid unnecessary WB_SYNC_NONE writeback during sync(1)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christoph:

On Thu, Jul 28, 2011 at 1:37 PM, Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> On Thu, Jul 28, 2011 at 07:42:03PM +0200, Jan Kara wrote:
>> > be do a blocking sync_inodes_sb, but then a non-blocking ->sync_fs?
>>   I tried to explain this in the changelog but it seems I failed.
>> wakeup_flusher_threads(0) starts completely asynchronous writeback of all
>> bdis. We could do an iteration of sb_sync_fs with wait==0 over all
>> superblocks just after wakeup_flusher_threads(0) returns but at this
>> moment, almost nothing is written yet because flusher threads have just
>> started doing their work. So what useful work can ->sync_fs(0) do in such
>> case? E.g. ocfs2 could start processing orphan list but for example
>> starting quota writeback as done by XFS would be really fruitless as quota
>> is going to change heavily as a result of writeback I expect. Similarly
>> ext3 or ext4 cannot really do useful work while most inodes are not yet
>> written.
>
> the way xfs currently starts quota writeback actually seems pretty
> useless to be already.  I don't think removing it will have a
> performance impact, but I will have to benchmark it carefully.  Remember
> that even right now the chances that all inodes are actually written
> is fairly low on a busy systems, so we might just create a lot of
> duplicate work by the two sync_fs passes.
>
>>
>> It would be nice to call ->sync_fs with wait == 0 when flusher thread is
>> done with the bdi but there's no easy way to do that. We could use the
>> completion for this but we'd have to track these for every bdi which isn't
>> very nice. That's why I chose to synchronize with flusher threads using
>> the synchronous inode writeback - when that is finished we know flusher
>> threads are done and so calling ->sync_fs() can be useful. So do you think
>> calling ->sync_fs() with wait == 0 that late is a problem?
>
> It's far too much subtile changes for one patch.  The current sync
> sequence in pseudo-code is:
>
>
> wakeup_flusher_threads()
> for_each_sb {
>        writeback_inodes_sb();
>        sync_fs(nowait);
>        __sync_blockdev(nowait);
> }
> for_each_sb {
>        sync_inodes_sb();
>        sync_fs(wait);
>        __sync_blockdev(nowait);

Do you mean "__sync_blockdev(wait)" here?  'Cause that's what
__sync_filesystem() does if wait=1.

I'd have just chalked it up to a typo, but you repeated it everywhere
below too, and wanted to see if I'm missing something.

Curt

> }
>
> and currently your patch changes it to:
>
> wakeup_flusher_threads()
> for_each_sb {
>        sync_inodes_sb();
>        sync_fs(nowait);
> }
> for_each_bdev
>        __sync_blockdev(nowait);
> for_each_sb
>        sync_fs(wait);
> for_each_bdev
>        __sync_blockdev(wait);
>
>
> Let's do things bisectable, with one behaviour change and one goal at a
> time.
>
> That is first patch in the sequence splits the loops and moves to:
>
> wakeup_flusher_threads()
> for_each_sb
>        writeback_inodes_sb();
> for_each_sb
>        sync_fs(nowait);
> for_each_sb
>        __sync_blockdev(nowait);
> for_each_sb
>        sync_inodes_sb();
> for_each_sb
>        sync_fs(wait);
> for_each_sb
>        __sync_blockdev(nowait);
>
> Then as next steps add one patch that moves to iterate over the block
> devices for the __sync_blockdev pass, and one that drops the
> writeback_inodes_sb call at which point we'll have
>
> wakeup_flusher_threads()
> for_each_sb
>        sync_fs(nowait);
> for_each_bdev
>        __sync_blockdev(nowait);
> for_each_sb
>        sync_inodes_sb();
> for_each_sb
>        sync_fs(wait);
> for_each_bdev
>        __sync_blockdev(nowait);
>
> And now we can thing about optimizations.  Randomly changing order
> as done right now doesn't seem to helpful.  My theory would be that
> we could simply drop the nowait versions of sync_fs entirely, as it
> doesn't do a lot of work that gets invalidated by the inode writeback
> later on, but we'd really have to validate that theory by benchmark
> runs on the filesystems where people care about performance.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux