On Thu 27-10-11 14:31:33, Wu Fengguang wrote: > On Fri, Oct 21, 2011 at 06:26:16AM +0800, Jan Kara wrote: > > On Thu 20-10-11 21:39:38, Wu Fengguang wrote: > > > On Thu, Oct 20, 2011 at 08:33:00PM +0800, Wu Fengguang wrote: > > > > On Thu, Oct 20, 2011 at 08:09:09PM +0800, Wu Fengguang wrote: > > > > > Jan, > > > > > > > > > > I tried the below combined patch over the ioless one, and find some > > > > > minor regressions. I studied the thresh=1G/ext3-1dd case in particular > > > > > and find that nr_writeback and the iostat avgrq-sz drops from time to time. > > > > > > > > > > I'll try to bisect the changeset. > > > > > > This is interesting, the culprit is found to be patch 1, which is > > > simply > > > if (work->for_kupdate) { > > > oldest_jif = jiffies - > > > msecs_to_jiffies(dirty_expire_interval * 10); > > > - work->older_than_this = &oldest_jif; > > > - } > > > + } else if (work->for_background) > > > + oldest_jif = jiffies; > > Yeah. I had a look into the trace and you can notice that during the > > whole dd run, we were running a single background writeback work (you can > > verify that by work->nr_pages decreasing steadily). Without refreshing > > oldest_jif, we'd write block device inode for /dev/sda (you can identify > > that by bdi=8:0, ino=0) only once. When refreshing oldest_jif, we write it > > every 5 seconds (kjournald dirties the device inode after committing a > > transaction by dirtying metadata buffers which were just committed and can > > now be checkpointed either by kjournald or flusher thread). So although the > > performance is slightly reduced, I'd say that the behavior is a desired > > one. > > > > Also if you observed the performance on a really long run, the difference > > should get smaller because eventually, kjournald has to flush the metadata > > blocks when the journal fills up and we need to free some journal space and > > at that point flushing is even more expensive because we have to do a > > blocking write during which all transaction operations, thus effectively > > the whole filesystem, are blocked. > > Jan, I got figures for test case > > ext3-1dd-4k-8p-2941M-1000M:10-3.1.0-rc9-ioless-full-nfs-wq5-next-20111014+ > > There is no single drop of nr_writeback in the longer 1200s run, which > wrote ~60GB data. I did some calculations. Default journal size for a filesystem of your size is 128 MB which allows recording of around 128 GB of data. So your test probably didn't hit the point where the journal is recycled yet. An easy way to make sure journal gets recycled is to set its size to a lower value when creating the filesystem by mke2fs -J size=8 Then at latest after writing 8 GB the effect of journal recycling should be visible (I suggest writing at least 16 or so so that we can see some pattern). Also note that without the patch altering background writeback, kjournald will do all the writeback of the metadata and kjournal works with buffer heads. Thus IO it does is *not* accounted in mm statistics. You will observe its effects only by a sudden increase in await or svctm because the disk got busy by IO you don't see. Also secondarily you could probably observe that as a hiccup in the number of dirtied/written pages. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html