On 07/05/12 18:54, Andreas Dilger wrote: > On 2012-05-07, at 10:44 AM, Daniel Pocock wrote: > >> On 07/05/12 18:25, Martin Steigerwald wrote: >> >>> Am Montag, 7. Mai 2012 schrieb Daniel Pocock: >>> >>>> 2x SATA drive (NCQ, 32MB cache, no hardware RAID) >>>> md RAID1 >>>> LVM >>>> ext4 >>>> >>>> a) If I use data=ordered,barrier=1 and `hdparm -W 1' on the drive, >>>> I observe write performance over NFS of 1MB/sec (unpacking a >>>> big source tarball) >>>> >>>> b) If I use data=writeback,barrier=0 and `hdparm -W 1' on the drive, >>>> I observe write performance over NFS of 10MB/sec >>>> >>>> c) If I just use the async option on NFS, I observe up to 30MB/sec >>>> > The only proper way to isolate the cause of performance problems is to test each layer separately. > > What is the performance running this workload against the same ext4 > filesystem locally (i.e. without NFS)? How big are the files? If > you run some kind of low-level benchmark against the underlying MD > RAID array, with synchronous IOPS of the average file size, what is > the performance? > > - the test file is 5MB compressed, over 100MB uncompressed, many C++ files of varying sizes - testing it locally is definitely faster - but local disk writes can be cached more aggressively than writes from an NFS client, so it is not strictly comparable > Do you have something like the MD RAID resync bitmaps enabled? That > can kill performance, though it improves the rebuild time after a > crash. Putting these bitmaps onto a small SSH, or e.g. a separate > boot disk (if you have one) can improve performance significantly. > > I've checked /proc/mdstat, it doesn't report any bitmap at all >>> c) won´t harm local filesystem consistency, but should the nfs server break down all data that the NFS clients sent to the server for >>> writing which is not written yet is gone. >>> >> Most of the access is from NFS, so (c) is not a good solution either. >> > Well, this behaviour is not significantly worse than applications > writing to a local filesystem, and the node crashing and losing the > dirty data in memory that has not been written to disk. > > A lot of the documents I've seen about NFS performance suggest it is slightly worse though, because the applications on the client have received positive responses from fsync() >>>> - or must I just use option (b) but make it safer with battery-backed >>>> write cache? >>>> >>> If you want performance and safety that is the best option from the >>> ones you mentioned, if the workload is really I/O bound on the local filesystem. >>> >>> Of course you can try the usual tricks like noatime, remove rsize and >>> wsize options on the NFS client if they have a new enough kernel (they >>> autotune to much higher than the often recommended 8192 or 32768 bytes, >>> look at /proc/mounts), put ext4 journal onto an extra disk to reduce head seeks, check whether enough NFS server threads are running, try a >>> different filesystem and so on. >>> >> One further discovery I made: I decided to eliminate md and LVM. I had >> enough space to create a 256MB partition on one of the disks, and format >> it directly with ext4 >> >> Writing to that partition from the NFS3 client: >> - less than 500kBytes/sec (for unpacking a tarball of source code) >> - around 50MB/sec (dd if=/dev/zero conv=fsync bs=65536) >> >> and I then connected an old 5400rpm USB disk to the machine, ran the >> same test from the NFS client: >> - 5MBytes/sec (for unpacking a tarball of source code) - 10x faster than >> the 72k SATA disk >> > Possibly the older disk is lying about doing cache flushes. The > wonderful disk manufacturers do that with commodity drives to make > their benchmark numbers look better. If you run some random IOPS > test against this disk, and it has performance much over 100 IOPS > then it is definitely not doing real cache flushes. > > I would agree that is possible - I actually tried using hdparm and sdparm to check cache status, but they don't work with the USB drive I've tried the following directly onto the raw device: dd if=/dev/zero of=/dev/sdc1 bs=4096 count=65536 conv=fsync 29.2MB/s and iostat reported avg 250 write/sec, avgrq-sz = 237, wkB/s = 30MB/sec I tried a smaller write as well (just count=1024, total 4MB of data) and it also reported a slower speed, which suggests that it really is writing the data out to disk and not just caching. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html