Re: [PATCH 1/2] writeback: Improve busyloop prevention

Jan Kara <jack@xxxxxxx> · Thu, 3 Nov 2011 02:51:36 +0100



On Thu 03-11-11 02:56:03, Wu Fengguang wrote:
> On Fri, Oct 28, 2011 at 04:31:04AM +0800, Jan Kara wrote:
> > On Thu 27-10-11 14:31:33, Wu Fengguang wrote:
> > > On Fri, Oct 21, 2011 at 06:26:16AM +0800, Jan Kara wrote:
> > > > On Thu 20-10-11 21:39:38, Wu Fengguang wrote:
> > > > > On Thu, Oct 20, 2011 at 08:33:00PM +0800, Wu Fengguang wrote:
> > > > > > On Thu, Oct 20, 2011 at 08:09:09PM +0800, Wu Fengguang wrote:
> > > > > > > Jan,
> > > > > > > 
> > > > > > > I tried the below combined patch over the ioless one, and find some
> > > > > > > minor regressions. I studied the thresh=1G/ext3-1dd case in particular
> > > > > > > and find that nr_writeback and the iostat avgrq-sz drops from time to time.
> > > > > > > 
> > > > > > > I'll try to bisect the changeset.
> > > > > 
> > > > > This is interesting, the culprit is found to be patch 1, which is
> > > > > simply
> > > > >                 if (work->for_kupdate) {
> > > > >                         oldest_jif = jiffies -
> > > > >                                 msecs_to_jiffies(dirty_expire_interval * 10);
> > > > > -                       work->older_than_this = &oldest_jif;
> > > > > -               }
> > > > > +               } else if (work->for_background)
> > > > > +                       oldest_jif = jiffies;
> > > >   Yeah. I had a look into the trace and you can notice that during the
> > > > whole dd run, we were running a single background writeback work (you can
> > > > verify that by work->nr_pages decreasing steadily). Without refreshing
> > > > oldest_jif, we'd write block device inode for /dev/sda (you can identify
> > > > that by bdi=8:0, ino=0) only once. When refreshing oldest_jif, we write it
> > > > every 5 seconds (kjournald dirties the device inode after committing a
> > > > transaction by dirtying metadata buffers which were just committed and can
> > > > now be checkpointed either by kjournald or flusher thread). So although the
> > > > performance is slightly reduced, I'd say that the behavior is a desired
> > > > one.
> > > > 
> > > > Also if you observed the performance on a really long run, the difference
> > > > should get smaller because eventually, kjournald has to flush the metadata
> > > > blocks when the journal fills up and we need to free some journal space and
> > > > at that point flushing is even more expensive because we have to do a
> > > > blocking write during which all transaction operations, thus effectively
> > > > the whole filesystem, are blocked.
> > > 
> > > Jan, I got figures for test case
> > > 
> > > ext3-1dd-4k-8p-2941M-1000M:10-3.1.0-rc9-ioless-full-nfs-wq5-next-20111014+
> > > 
> > > There is no single drop of nr_writeback in the longer 1200s run, which
> > > wrote ~60GB data.
> >   I did some calculations. Default journal size for a filesystem of your
> > size is 128 MB which allows recording of around 128 GB of data. So your
> > test probably didn't hit the point where the journal is recycled yet. An
> > easy way to make sure journal gets recycled is to set its size to a lower
> > value when creating the filesystem by
> >   mke2fs -J size=8
> > 
> >   Then at latest after writing 8 GB the effect of journal recycling should
> > be visible (I suggest writing at least 16 or so so that we can see some
> > pattern). Also note that without the patch altering background writeback,
> > kjournald will do all the writeback of the metadata and kjournal works with
> > buffer heads. Thus IO it does is *not* accounted in mm statistics. You will
> > observe its effects only by a sudden increase in await or svctm because the
> > disk got busy by IO you don't see. Also secondarily you could probably
> > observe that as a hiccup in the number of dirtied/written pages.
> 
> Jan, finally the `correct' results for "-J size=8" w/o the patch
> altering background writeback.
> 
> I noticed the periodic small drops of nr_writeback in
> global_dirty_state.png, other than that it looks pretty good.
  If you look at iostat graphs, you'll notice periodic increases in await
time in roughly 100 s intervals. I belive this could be checkpointing
that's going on in the background. Also there are (negative) peaks in the
"paused" graph. Anyway, the main question is - do you see any throughput
difference with/without the background writeback patch with the small
journal?

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html