VFS scaling evaluation results, redux.

Frank Mayhar <fmayhar@xxxxxxxxxx> · Fri, 22 Oct 2010 12:03:00 -0700

After seeing the newer work a couple of weeks ago, I decided to rerun
the tests against Dave Chinner's tree just to see how things fare with
his changes.  This time I only ran the "socket test" due to time
constraints and since the "storage test" didn't produce anything
particularly interesting last time.

Unfortunately I was unable to use the hardware I used previously so I
reran the test against both the 2.6.35 base and 2.6.35 plus Nick's
changes in addition to the 2.6.36 base and Dave's version of same.  The
changed hardware changed the absolute test results substantially, so I'm
making no comparisons with the previous runs.

Once more I ran a âsocket testâ on systems with a moderate number of
cores and memory (unfortunately I canât say more about the hardware).  I
gathered test results and kernel profiling data for each. 

The "Socket Test" does a lot of socket operations; it fields lots of
connections, receiving and transmitting small amounts of data over each.
The application it emulates has run into bottlenecks on the dcache_lock
and the inode_lock several times in the past, which is why I chose it as
a target.

The test is multithreaded with at least one thread per core and is
designed to put as much load on the application being tested as
possible.  It is in fact designed specifically to find performance
regressions (albeit at a higher level than the kernel), which makes it
very suitable for this testing.

The kernels were very stable; I saw no crashes or hangs during my
testing.

The "Socket Test" has a target rate which I'll refer to as 100%.
Internal Google kernels (with modifications to specific code paths)
allow the test to generally achieve that rate, albeit not without
substantial effort.  Against the base 2.6.35 kernel I saw a rate of
around 13.9%; the modified 2.6.35 kernel had a rate of around 8.38%.
The base 2.6.36 kernel was effectively unchanged relative to the 2.6.35
kernel with a rate of 14.12% and, likewise, the modified 2.6.36 kernel
had a rate of around 9.1%.  In each case the difference is small and
expected given the environment.

The kernel profiles were not nearly as straightforward this time.
Running the test against a base and improved 2.6.35 kernel, there were
some (fairly subtle) improvements with respect to the dcache locking, in
which as before the amount of time there dropped slightly.  This time,
however, both tests spend essentially the same amount of time in locking
primitives.

I compared the 2.6.35 base and 2.6.36 base kernels as well, to see where
improvements might show up without Nick's and Dave's improvements.  I
saw that time spent in locking primitives dropped slightly but
consistently.

Comparing 2.6.36 base and improved kernels, again there seemed to be
some subtle improvements to the dcache locking but otherwise the tests
spent about the same amount of time in the locking primitives.

As before the original profiles are available at
    http://code.google.com/p/vfs-scaling-eval/downloads/list
The newer data is marked as "Socket_test-profile-<kernel
version>-<name>".  Data from the previous evaluation is there as well.
-- 
Frank Mayhar <fmayhar@xxxxxxxxxx>
Google Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html