Re: [rfc][patch 2/3] fs: sync_sb_inodes fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 10, 2008 at 08:41:33AM -0500, Christoph Hellwig wrote:
> On Mon, Nov 10, 2008 at 02:23:08PM +0100, Nick Piggin wrote:
> > 
> > Fix data integrity semantics required by sys_sync, by iterating over all
> > inodes and waiting for any writeback pages after the initial writeout.
> > Comments explain the exact problem.
> > 
> > ---
> > Index: linux-2.6/fs/fs-writeback.c
> > ===================================================================
> > --- linux-2.6.orig/fs/fs-writeback.c
> > +++ linux-2.6/fs/fs-writeback.c
> > @@ -440,6 +440,7 @@ void generic_sync_sb_inodes(struct super
> >  				struct writeback_control *wbc)
> >  {
> >  	const unsigned long start = jiffies;	/* livelock avoidance */
> > +	int sync = wbc->sync_mode == WB_SYNC_ALL;
> >  
> >  	spin_lock(&inode_lock);
> >  	if (!wbc->for_kupdate || list_empty(&sb->s_io))
> > @@ -516,7 +517,42 @@ void generic_sync_sb_inodes(struct super
> >  		if (!list_empty(&sb->s_more_io))
> >  			wbc->more_io = 1;
> >  	}
> > -	spin_unlock(&inode_lock);
> > +
> > +	if (unlikely(sync)) {
> 
> Please don't mark things like this unlikely.  Static branch prediction
> is really nasty on many cpus when the prediction was wrong, and dynamic
> branch brediction can do thing much better.

There's a few aspects to this :)

Static branch hint instruction could indeed turn off dynamic prediction
so in that cause it would always mispredict even if the workload was
continually going the other way.


> In general never use unlikely for something like this which can happen
> with a normal workload.

However, I think you have to consider frequency, and also relative
importance of each case. This is going to be taken on sys_sync, and
sys_umount. And not taken for pdflush writeout.

I think sync and umount are both much less frequent, and also much
less important to be fast.

It can allow gcc to be much smarter about code genreation too. Although
AFAIK it mainly just moves the block out of the icache hot path, it
could in future do things like switch off predicated instructions for
both branches, and also do better register allocation. Just what gcc does
today can be fairly significant.

But... I don't feel strongly about this particular path, so I can remove
it.


> > +		 * we still have to wait for that writeout.
> > +		 */
> > +		list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > +			struct address_space *mapping;
> > +
> > +			if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > +				continue;
> > +			mapping = inode->i_mapping;
> > +			if (mapping->nrpages == 0)
> > +				continue;
> > +			__iget(inode);
> > +			spin_unlock(&inode_lock);
> > +			/* see dquot.c comment */
> 
> Not an exactly useful comment :)

/* see fs/dquot.c:add_dquot_ref() comment */ ?

Thanks, I will repost with these changes after waiting for more feedback.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux