Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > We would want to know the performance benefits in some detail before even > looking at the code, no? Maybe they're in here somewhere but I missed it.. Okay... You wanted some benchmarks, here are some. I should try and automate the procedure since it's pretty straightforward, just time consuming to do by hand. ENVIRONMENT =========== I'm using a pair of computers, one an NFS server, the other an NFS client, connected by ZyXEL PL-100 ethernet-over-mains adapters to throttle the network bandwidth. As far as I can tell, the TCP bandwidth as seen by a pair of netcats communing with each other maxes out at about 890KB/s or 6.95 Mbits/s. Amazon rates the PL-100's as up to 85Mbits/s, but I don't seem to be getting anything like that. The client was rebooted after each test, but the server wasn't. The server was persuaded to pull the entire working set for each test into RAM to eliminate disk I/O latencies at that end. The Ext3 partition used for the cache was tuned to have 4096-byte blocks. During each run, a watch was put on the FS-Cache statistics on the client machine: watch -n0 cat /proc/fs/fscache/stats This went over SSH to my desktop machine by GigE ethernet. FIRST BENCHMARK =============== The first benchmark involved pulling a 100MB file by NFS to the client using cat to /dev/zero run under time as a test. The 'Time taken' reported by time was logged. The benchmark was repeated three times and the average was taken: Cache RUN #1 RUN #2 RUN #3 AVG ======= =============== =============== =============== =============== SERVER 0m0.062s NONE 1m59.462s 1m59.948s 2m1.852s 2.007 mins COLD 1m58.448s 1m59.436s 2m5.746s 2.020 mins HOT 0m2.235s 0m2.154s 0m2.171s 0.036 mins PGCACHE 0m0.040s Firstly the test was run on the server twice and the second result logged (SERVER). Secondly, the client was rebooted and the test was run with the cachefilesd not started and that was logged (NONE). After rebooting, the cache contents were erased (mke2fs) and cachefilesd was started and the test run again, which loaded the cache (COLD). Then the box was rebooted, cachefilesd was started and the test run a third time, this time with a populated cache (HOT). This was repeated twice. Finally, for reference, the client test was run again without unmounting, stopping or rebooting anything so that the client's pagecache would act as the cache (PGCACHE). SECOND BENCHMARK ================ The second benchmark involved pulling a 256MB (as reported by du -s) kernel tree or 1185 directories containing 19258 files using a single tar to /dev/zero as a tesk. The 'Time taken' reported by time was logged. The benchmark was repeated three times and the average was taken: Cache RUN #1 RUN #2 RUN #3 AVG ======= =============== =============== =============== =============== SERVER 0m0.348s NONE 7m35.335s 7m42.075s 7m32.797s 7.612 mins COLD 7m45.117s 7m54.774s 8m2.172s 7.900 mins HOT 7m14.970s 7m10.953s 7m16.390s 7.235 mins PGCACHE 3m10.864s The procedure was as for the first benchmark. For the second benchmark I also gathered data from the /proc/$$/mountstats file to determine the network loading of run #3. The following table shows the counts of three different RPC operations issued, and the number of bytes read over the network as part of READ RPC operations: Cache GETATTR (N) ACCESS (N) READ (N) READ (BYTES) ======= =============== =============== =============== =============== NONE 22371 20486 21402 221252168 COLD 22411 20486 21402 221252168 HOT 22495 20481 0 0 CONCLUSION ========== As can be seen, the network link I have between my test server and test client is at about the break-even point for a large quantity of medium-small files (as might be found in a source tree) with respect to the total time it takes to completely read the files over NFS. However, for those medium-small files, the reduction in network loading is huge for repeat mass reads. The time went from 7.6mins to 7.2mins, which is nice but not hugely significant, but the network loading dropped by ~21,000 RPC operations at a grand total of >220MB of data on the wire, allowing for network metadata, within those 7 minutes. For fewer but much larger files the cache has a proportionately greater effect as the client incurs lower costs from Ext3 lookups as it is doing many fewer of them, but gains greatly from Ext3's ability to glue large groups of contiguous reads together and to do lookahead. Similarly to the previous case, having this data in the cache will reduce the network loading for repeat reads. A comparison of the second benchmark test run against the server's pagecache versus that test run against the client's pagecache is quite interesting. The server can perform the tar in a third of a second, but the client takes over three minutes. That would indicate that something on the order of just over 3 minutes's worth of time is spent by each of the NONE, COLD and HOT test runs doing things other than reads. That would be GETATTR, ACCESS, and READDIRPLUS ops. Another way of looking at it is that the NONE test or the second test spends a little over 4 minutes doing READ ops from the network, and that the HOT test spends almost as much time doing lookup, getxattr and read ops against Ext3. It's also worth noting that the neither benchmark did the COLD test take very much more time than the NONE test, despite doing lookups, mkdirs, creates, setxattrs and writes in the background. Of course, these two benchmarks are very much artificial: there was no other significant loading on the network between the client and the server; there was no other significant load on either machine; the cache started out empty and probably got loaded in optimal order; the cache was large enough to never need culling; only one program (cat or tar) was run at once. David -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html