Re: [Bug #12604] Commit 31a12666d8f0c22235297e1c1575f82061480029 slows down Berkeley DB

Jan Kara <jack@xxxxxxx> · Thu, 12 Feb 2009 15:32:49 +0100



On Thu 12-02-09 04:34:23, Nick Piggin wrote:
> On Wed, Feb 11, 2009 at 06:02:19PM -0800, Linus Torvalds wrote:
> > 
> > 
> > On Thu, 12 Feb 2009, Nick Piggin wrote:
> > 
> > > On Tue, Feb 10, 2009 at 05:28:30PM +0100, Jan Kara wrote:
> > > > On Sun 08-02-09 20:21:42, Rafael J. Wysocki wrote:
> > > > > This message has been generated automatically as a part of a report
> > > > > of recent regressions.
> > > > > 
> > > > > The following bug entry is on the current list of known regressions
> > > > > from 2.6.28.  Please verify if it still should be listed and let me know
> > > > > (either way).
> > > >   Yes, I've verified with latest git and the regression is still there.
> > > 
> > > I'm working on this FWIW...
> > 
> > Shouldn't we just revert it? The code does look to be broken.
> > 
> > It also looks like the interaction with that ever-buggy "nr_to_write" 
> > thing are totally wrong. I can see that whole
> > 
> > 	if (!cycled) {
> > 		..
> > 		index = 0;
> > 		goto retry
> > 	}
> > 
> > doing all the wrong things: if we ever hit the combination of 
> > "!cycled + nr_to-write==0", we're always screwed. It simply _cannot_ do 
> > the right thing.
> 
> Doh, I think you spotted the bug. I was going off on the wrong track.
> 
>  
> > I dunno. That whole piece-of-sh*t function has been incredibly buggy this 
> > release. The code is an unreadable mess, and I think that "cyclic" stuff 
> > is part of the reason for it being messy and buggy. Please convince me 
> > it's worth saving, or let me revert the whole stinking pile of crud?
> 
> Well... the cyclic stuff apparently gets used by some filesystems to do
> data integrity stuff, so I think the problem addressed by my broken
> commit maybe shouldn't be ignored.
> 
> Maybe you're being harsh on this function. It has been two thinko bugs
> so far. Before this release cycle it seemed to have the record for the
> most data integrity bugs in a single function...
> 
> How's this? Jan, can you test with the bdb workload please?
  Yes, this patch returns write performance of BDB workload back to
original numbers. Thanks for the fix.

								Honza

> --
> 
> A bug was introduced into write_cache_pages cyclic writeout by commit
> 31a12666d8f0c22235297e1c1575f82061480029. The intention (and comments)
> is that we should cycle back and look for more dirty pages at the
> beginning of the file if there is no more work to be done.
> 
> But the !done condition was dropped from the test. This means that any
> time the page writeout loop breaks (eg. due to nr_to_write == 0), we
> will set index to 0, then goto again. This will set done_index to index,
> then find done is set, so will proceed to the end of the function. When
> updating mapping->writeback_index for cyclic writeout, we now use
> done_index == 0, so we're always cycling back to 0.
> 
> This seemed to be causing random mmap writes (slapadd and iozone) to
> start writing more pages from the LRU and writeout would slowdown.
> 
> With this patch, iozone random write performance is increased nearly
> 5x on my system (iozone -B -r 4k -s 64k -s 512m -s 1200m on ext2). 
> 
> Signed-off-by: Nick Piggin <npiggin@xxxxxxx>
> ---
> Index: linux-2.6/mm/page-writeback.c
> ===================================================================
> --- linux-2.6.orig/mm/page-writeback.c	2009-02-12 13:30:42.000000000 +1100
> +++ linux-2.6/mm/page-writeback.c	2009-02-12 14:16:28.000000000 +1100
> @@ -1079,7 +1079,7 @@ continue_unlock:
>  		pagevec_release(&pvec);
>  		cond_resched();
>  	}
> -	if (!cycled) {
> +	if (!cycled && !done) {
>  		/*
>  		 * range_cyclic:
>  		 * We hit the last page and there is more work to be done: wrap
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe kernel-testers" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html