Re: __fsync_super and inode dirty pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 05, 2008 at 11:24:39PM +0530, Aneesh Kumar K.V wrote:
> Hi,
> 
> I am trying to understand the expectation on filesystem
> writepages by __fsync_super -> generic_sync_sb_inodes call path.
> __fsync_super make two calls to generic_sync_sb_inodes and the 
> second call is made with WB_SYNC_ALL. The problem is, i am finding 
> multiple ways we may 'break' from the generic_sync_sb_inodes
> while (!list_empty(&sb->s_io)) loop.  For ex:
> 
> a) When wbc->nr_to_write <=0 
>    This should normally indicate that we wrote more than we requested
>    That means we have more dirty pages . So why break from loop ?
>    This may be ok for nr_to_write limited writeback. But is it ok
>    for __fsync_super ?

Should never happen - nr_to_write is set to 150% of the number of
dirty pages in the system in sync_inodes_sb(). If this is not
enough, then something is skipping lots of dirty pages during a
data integrity sync and that sounds wrong....

> b) pages_skipped != wbc->pages_skipped.
>    That means writepages was not able to write some of the pages
>    and it redirtied the same. We add such inode to s_dirty list. But
>    we don't loop through the s_dirty list later trying to write the
>    dirty pages. We actually call generic_sync_sb_inodes again. That
>    can actually cause s_dirty list content to move to s_io once
>    via queue_io. But then we can have pages skipped again in the second
>    call to generic_sync_sb_inodes ?

Again, should never happen if WB_SYNC_ALL is set. We should block
waiting for locks, etc (rather than behaving in a non-blocking manner
and redirtying the page instead of blocking) if WB_SYNC_ALL is set.
i.e. if WB_SYNC_ALL is set we should never skip pages......

> Shouldn't we have a variant that actually loop trough s_dirty and s_more_io
> list MULTIPLE times and make sure we force all the dirty pages to disk
> before __fsync_super returns ? Or is there an expectation from file
> system writepages that would make sure calling generic_sync_sb_inodes
> TWICE would guarantee that ?

The two calls are for performance reasons. The first is
non-blocking, async I/O to get the dirty data and inodes out to disk
as fast and efficiently as possible.  The second pass is a
synchronous pass that provides the data integrity part of sync.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux