On Thu, Sep 02, 2010 at 12:37:39AM -0500, Stan Hoeppner wrote: > Dave Chinner put forth on 9/1/2010 1:44 AM: > > > 4p VM w/ 2GB RAM with the > > disk image on a hw-RAID1 device make up of 2x500Gb SATA drives (create > > and remove 800k files): > > > FSUse% Count Size Files/sec App Overhead > > 2 800000 0 54517.1 6465501 > > $ > > > > The same test run on a 8p VM w/ 16Gb RAM, with the disk image hosted > > on a 12x2TB SAS dm RAID-0 array: > > > > FSUse% Count Size Files/sec App Overhead > > 2 800000 0 51409.5 6186336 > > Is this a single socket quad core Intel machine with hyperthreading > enabled? No, It's a dual socket (8c/16t) server. > That would fully explain the results above. Looks like you > ran out of memory bandwidth in the 4 "processor" case. Adding phantom > CPUs merely made them churn without additional results. No, that's definitely not the case. A different kernel in the same 8p VM, 12x2TB SAS storage, w/ 4 threads, mount options "logbsize=262144" FSUse% Count Size Files/sec App Overhead 0 800000 0 39554.2 7590355 4 threads with mount options "logbsize=262144,delaylog" FSUse% Count Size Files/sec App Overhead 0 800000 0 67269.7 5697246 http://userweb.kernel.org/~dgc/shrinker-2.6.36/fs_mark-2.6.36-rc3-4-thread-delaylog-comparison.png Top chart is CPu usage, 2nd chart is disk iops (purple is write), thrid chart is disk bandwidth (purple is write), and the bottom chart is create rate (yellow) and unlink rate (green). >From left to write, the first IO peak (~1000 iops, 250MB/s) is the mkfs‥xfs. the next sustained load is the first fs_mark workload without delayed logging - 2500 iops and 500MB/s, and the second is the same workload again with delayed logging enabled (zero IO, roughly 400% CPU utilisation and significantly higher create/unlink rates). I'll let you decide which of thw two IO patterns is sustainable on a single sata disk yourself. ;) > > FSUse% Count Size Files/sec App Overhead > > 2 800000 0 15118.3 7524424 > > > > delayed logging is 3.6x faster on the same filesystem. It went from > > 15k files/s at ~120% CPU utilisation, to 54k files/s at 400% CPU > > utilisation. IOWs, it is _clearly_ CPU bound with delayed logging as > > there is no idle CPU left in the VM at all. > > Without seeing all of what you have available, going on strictly the > data above, I disagree. I'd say your bottleneck is your memory/IPC > bandwidth. You are free to choose to believe I don't know I'm doing - if you can get XFS to perform better, then I'll happily take the patches ;) > If my guess about your platform is correct, try testing on a dual socket > quad core Opteron with quad memory channels. Test with 2, 4, 6, and 8 > fs_mark threads. Did that a long time ago - it's in the archives a few months back. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs