On Tue, Aug 05, 2008 at 11:24:39PM +0530, Aneesh Kumar K.V wrote: > Hi, > > I am trying to understand the expectation on filesystem > writepages by __fsync_super -> generic_sync_sb_inodes call path. > __fsync_super make two calls to generic_sync_sb_inodes and the > second call is made with WB_SYNC_ALL. The problem is, i am finding > multiple ways we may 'break' from the generic_sync_sb_inodes > while (!list_empty(&sb->s_io)) loop. For ex: > > a) When wbc->nr_to_write <=0 > This should normally indicate that we wrote more than we requested > That means we have more dirty pages . So why break from loop ? > This may be ok for nr_to_write limited writeback. But is it ok > for __fsync_super ? Should never happen - nr_to_write is set to 150% of the number of dirty pages in the system in sync_inodes_sb(). If this is not enough, then something is skipping lots of dirty pages during a data integrity sync and that sounds wrong.... > b) pages_skipped != wbc->pages_skipped. > That means writepages was not able to write some of the pages > and it redirtied the same. We add such inode to s_dirty list. But > we don't loop through the s_dirty list later trying to write the > dirty pages. We actually call generic_sync_sb_inodes again. That > can actually cause s_dirty list content to move to s_io once > via queue_io. But then we can have pages skipped again in the second > call to generic_sync_sb_inodes ? Again, should never happen if WB_SYNC_ALL is set. We should block waiting for locks, etc (rather than behaving in a non-blocking manner and redirtying the page instead of blocking) if WB_SYNC_ALL is set. i.e. if WB_SYNC_ALL is set we should never skip pages...... > Shouldn't we have a variant that actually loop trough s_dirty and s_more_io > list MULTIPLE times and make sure we force all the dirty pages to disk > before __fsync_super returns ? Or is there an expectation from file > system writepages that would make sure calling generic_sync_sb_inodes > TWICE would guarantee that ? The two calls are for performance reasons. The first is non-blocking, async I/O to get the dirty data and inodes out to disk as fast and efficiently as possible. The second pass is a synchronous pass that provides the data integrity part of sync. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html