Re: compilebench numbers for ext4

"Jose R. Santos" <jrs@xxxxxxxxxx> · Thu, 25 Oct 2007 17:40:25 -0500

On Thu, 25 Oct 2007 14:43:55 -0400
Chris Mason <chris.mason@xxxxxxxxxx> wrote:
> > 
> > 2) You mentioned that one of the goals of the benchmark is to measure
> > locality during directory aging, but the workloads seems too well
> > order to truly age the filesystem.  At least that's what I can gather
> > from the output the benchmark spits out.  It may be that Im not
> > understanding the relationship between INITIAL_DIRS and RUNS, but the
> > workload seem to been localized to do operations on a single dir at a
> > time.  Just wondering is this is truly stressing allocation algorithms
> > in a significant or realistic way.
> 
> A good question.  compilebench has two modes, and the default is better
> at aging then the run I graphed on ext4.  compilebench isn't trying to
> fragment individual files, but it is instead trying to fragment
> locality, and lower the overall performance of a directory tree.
> 
> In the default run, the patch, clean, and compile operations end up
> changing around groups of files in a somewhat random fashion (at least
> from the FS point of view).  But, it is still a workload where a good
> FS should be able to maintain locality and provide consistent results
> over time.
> 
> The ext4 numbers I sent here are from compilebench --makej, which is a
> shorter and less complex run.  It has a few simple phases:
> 
> * create some number of kernel trees sequentially
> * write new files into those trees in random order
> * read a three of the trees
> * delete all the trees
> 
> It is a very basic test that can give you a picture of directory
> layout, writeback performance and overall locality.

Thanks.  This clear a couple of things and I think I now follow the
direction you're heading into with this workload. 

> > 
> > I really want to use seekwatcher to test some of the stuff that I'm
> > doing for flex_bg feature but it barfs on me in my test machine.
> > 
> > running :sleep 10:
> > done running sleep 10
> > Device: /dev/sdh
> >   Total:                     0 events (dropped 0),     1368 KiB data
> > blktrace done
> > Traceback (most recent call last):
> >   File "/usr/bin/seekwatcher", line 534, in ?
> >     add_range(hist, step, start, size)
> >   File "/usr/bin/seekwatcher", line 522, in add_range
> >     val = hist[slot]
> > IndexError: list index out of range
> 
> I don't think you have any events in the trace.  Try this instead:
> 
> echo 3 > /proc/sys/vm/drop_caches
> seekwatcher -t find-trace -d /dev/xxxx -p 'find /usr/local -type f'

Nope, get the same error.  There does seem to be data recorded in the
trace files and iostat does show activity on the disk.

toolssf2 ~ # echo 3 > /proc/sys/vm/drop_caches
toolssf2 ~ # seekwatcher -t find-trace -d /dev/sdb3 -p 'find /root -type f >/dev/null'
running :find /root -type f >/dev/null:
done running find /root -type f >/dev/null
Device: /dev/sdb3
  CPU  0:                    0 events,      303 KiB data
  CPU  1:                    0 events,      262 KiB data
  CPU  2:                    0 events,      205 KiB data
  CPU  3:                    0 events,      302 KiB data
  CPU  4:                    0 events,      240 KiB data
  CPU  5:                    0 events,      281 KiB data
  CPU  6:                    0 events,      191 KiB data
  CPU  7:                    0 events,      281 KiB data
  Total:                     0 events (dropped 0),     2061 KiB data
blktrace done
Traceback (most recent call last):
  File "/usr/bin/seekwatcher", line 534, in ?
    add_range(hist, step, start, size)
  File "/usr/bin/seekwatcher", line 522, in add_range
    val = hist[slot]
IndexError: list index out of range

> > This is running on a PPC64/gentoo combination.  Dont know if this
> > means anything to you.  I have a very basic algorithm for to take
> > advantage block group metadata grouping and want be able to better
> > visualize how different IO patterns take advantage or are hurt by the
> > feature.
> 
> I wanted to benchmark flexbg too, but couldn't quite figure out the
> correct patch combination ;)

Ill attach e2progfs and Kernel patches but do realize that these are
experimental patches that Im using to test what layout would work
best.  Don't take them too seriously as it is largely incomplete.

Currently trying to come up with workloads to test this and other
changes with.  Im am warming up to yours :)

To create a filesystem with the feature just do:
	mke2fs -j -I 256 -O flex_bg /dev/xxx

Curently the number of block group meta data that are group together
is EXT4_DESC_PER_BLOCK() which matches the meta_bg feature. This turns
out to be 128 block groups.  This may(probably will) change in the
future but it give a general idea of what benefits can be had with
large grouping of metadata.

On compilebench it seems to show a 10x improvement on "create dir"
since Im currently testing on a SCSI disk with write cache disable.  I
would think the improvements would be a lot less noticeable on a SATA
drive since those usually ship with write caching enable.  All other
test from the --makej runs where measurably better.  Would love to see
seekwatche working to tune a bit better though.

-JRS
Attachment:
flex_bg_test.tar.bz2

Description: application/bzip