Re: [PATCH 1/2] writeback: Improve busyloop prevention

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu 27-10-11 14:47:45, Wu Fengguang wrote:
> On Fri, Oct 21, 2011 at 06:26:16AM +0800, Jan Kara wrote:
> > On Thu 20-10-11 21:39:38, Wu Fengguang wrote:
> > > On Thu, Oct 20, 2011 at 08:33:00PM +0800, Wu Fengguang wrote:
> > > > On Thu, Oct 20, 2011 at 08:09:09PM +0800, Wu Fengguang wrote:
> > > > > Jan,
> > > > > 
> > > > > I tried the below combined patch over the ioless one, and find some
> > > > > minor regressions. I studied the thresh=1G/ext3-1dd case in particular
> > > > > and find that nr_writeback and the iostat avgrq-sz drops from time to time.
> > > > > 
> > > > > I'll try to bisect the changeset.
> > > 
> > > This is interesting, the culprit is found to be patch 1, which is
> > > simply
> > >                 if (work->for_kupdate) {
> > >                         oldest_jif = jiffies -
> > >                                 msecs_to_jiffies(dirty_expire_interval * 10);
> > > -                       work->older_than_this = &oldest_jif;
> > > -               }
> > > +               } else if (work->for_background)
> > > +                       oldest_jif = jiffies;
> >   Yeah. I had a look into the trace and you can notice that during the
> > whole dd run, we were running a single background writeback work (you can
> > verify that by work->nr_pages decreasing steadily). Without refreshing
> > oldest_jif, we'd write block device inode for /dev/sda (you can identify
> > that by bdi=8:0, ino=0) only once. When refreshing oldest_jif, we write it
> > every 5 seconds (kjournald dirties the device inode after committing a
> > transaction by dirtying metadata buffers which were just committed and can
> > now be checkpointed either by kjournald or flusher thread). So although the
> > performance is slightly reduced, I'd say that the behavior is a desired
> > one.
> 
> 
> This is part of
> ext3-1dd-4k-8p-2941M-1024M:10-3.1.0-rc9-ioless-full-more_io_wait-next-20111014+/trace.bz2
> 
>        flush-8:0-3133  [000]   357.418668: writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES dirtied_when=4294728197 age=297 index=3443711 to_write=6144 wrote=6144
>        flush-8:0-3133  [000]   357.954817: writeback_single_inode: bdi 8:0: ino=0 state= dirtied_when=4295024858 age=1 index=3708770 to_write=6144 wrote=59
>        flush-8:0-3133  [000]   358.509053: writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES dirtied_when=4294728197 age=298 index=3449855 to_write=6144 wrote=6144
>        flush-8:0-3133  [002]   358.774390: writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES dirtied_when=4294728197 age=298 index=3455999 to_write=6144 wrote=6144
>        flush-8:0-3133  [002]   358.783747: writeback_single_inode: bdi 8:0: ino=12 state=I_DIRTY_PAGES dirtied_when=4294728197 age=298 index=3462143 to_write=6144 wrote=6144
> 
> I noticed that there is nothing else blocking the flusher because the
> write_begin trace events always start immediately after each
> writeback_single_inode event.
> 
> And the writeback for ino 0 took 357.954817-357.418668 = ~500ms but
> only writes 59 pages, which is the major reason nr_writeback drops
> from ~80MB to ~50MB.
  Yes, this is not a big surprise. Metadata (indirect blocks) are
interleaved with data so all the metadata IO are just single block (4 KB)
writes unlike data IO which comes in 4 MB chunks. This is also why you see
request size drop in the graphs you attached to the next email.  Also since
you have to seek to fulfil each 4 KB IO request, and seek times are on the
order of 5-15 ms, you easily get to metadata writeback of 59 pages taking
around 500 ms. The filesystem could do better and avoid these seeks but it
does not and it is beyond the task of writeback to fix it. Writeback just
has to count with the fact that some pages are fast to write (e.g.
sequential writes as you generate them) and some pages are hard to write
(in this case metadata pages).

> What's more, nr_writeback is only able to restore
> after this event (perhaps ext3 is cleaning up something during the
> time, which blocks dd in write_begin() or whatever):
> 
>               dd-3556  [000]   358.768670: balance_dirty_pages: bdi 8:0: limit=262144 setpoint=213489 dirty=212003 bdi_setpoint=213389 bdi_dirty=211640 dirty_ratelimit=46580 task_ratelimit=47444 dirtied=149 dirtied_pause=149 period=12 think=196 pause=-184 paused=0
  Well, my bet would be that dd needed to access some metadata block (such
as free block bitmap) but it was under IO so dd was blocked until the IO
finished.

								Honza
-- 
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux