On Mon, Nov 10, 2008 at 08:41:33AM -0500, Christoph Hellwig wrote: > On Mon, Nov 10, 2008 at 02:23:08PM +0100, Nick Piggin wrote: > > > > Fix data integrity semantics required by sys_sync, by iterating over all > > inodes and waiting for any writeback pages after the initial writeout. > > Comments explain the exact problem. > > > > --- > > Index: linux-2.6/fs/fs-writeback.c > > =================================================================== > > --- linux-2.6.orig/fs/fs-writeback.c > > +++ linux-2.6/fs/fs-writeback.c > > @@ -440,6 +440,7 @@ void generic_sync_sb_inodes(struct super > > struct writeback_control *wbc) > > { > > const unsigned long start = jiffies; /* livelock avoidance */ > > + int sync = wbc->sync_mode == WB_SYNC_ALL; > > > > spin_lock(&inode_lock); > > if (!wbc->for_kupdate || list_empty(&sb->s_io)) > > @@ -516,7 +517,42 @@ void generic_sync_sb_inodes(struct super > > if (!list_empty(&sb->s_more_io)) > > wbc->more_io = 1; > > } > > - spin_unlock(&inode_lock); > > + > > + if (unlikely(sync)) { > > Please don't mark things like this unlikely. Static branch prediction > is really nasty on many cpus when the prediction was wrong, and dynamic > branch brediction can do thing much better. There's a few aspects to this :) Static branch hint instruction could indeed turn off dynamic prediction so in that cause it would always mispredict even if the workload was continually going the other way. > In general never use unlikely for something like this which can happen > with a normal workload. However, I think you have to consider frequency, and also relative importance of each case. This is going to be taken on sys_sync, and sys_umount. And not taken for pdflush writeout. I think sync and umount are both much less frequent, and also much less important to be fast. It can allow gcc to be much smarter about code genreation too. Although AFAIK it mainly just moves the block out of the icache hot path, it could in future do things like switch off predicated instructions for both branches, and also do better register allocation. Just what gcc does today can be fairly significant. But... I don't feel strongly about this particular path, so I can remove it. > > + * we still have to wait for that writeout. > > + */ > > + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { > > + struct address_space *mapping; > > + > > + if (inode->i_state & (I_FREEING|I_WILL_FREE)) > > + continue; > > + mapping = inode->i_mapping; > > + if (mapping->nrpages == 0) > > + continue; > > + __iget(inode); > > + spin_unlock(&inode_lock); > > + /* see dquot.c comment */ > > Not an exactly useful comment :) /* see fs/dquot.c:add_dquot_ref() comment */ ? Thanks, I will repost with these changes after waiting for more feedback. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html