On Tue, Jan 22, 2013 at 11:49:45PM +0100, Samuel Kvasnica wrote: > On 01/22/2013 11:22 PM, Dave Chinner wrote: > > On Tue, Jan 22, 2013 at 12:10:28PM +0100, Samuel Kvasnica wrote: > >> Hi folks, > >> > >> I would like to hear about your experience with the performance of XFS when > >> used on NFS client mounted using Infininband RDMA connection on 3.4.11 > >> kernel. > >> > >> What we observe is following: > >> > >> - we do have local RAID storage with 1.4GB/s read and write performance > >> (both dd on raw partition > >> and on xfs filesystem give basically the same performance) > >> > >> - we do have QDR Infiniband connection (Mellanox), the rdma benchmark > >> gives 29Gbit/s throughput > >> > >> Now, both above points look pretty Ok but if we mount an nfs export > >> using rdma on client we never get the 1.4GB/s throughput. > > Of course not. The efficiency of the NFS client/server write > > protocol makes it a theoretical impossibility.... > hmm, well, never seen that bottleneck on NFSv4 so far. Does this apply > for NFSv4 as well (as I use NFSv4, not v3). ? Yup, it's the same algorithm. > >> Sporadically (and especially at the beginning) it comes up to some > >> 1.3GB/s for short period but then it starts oscillating > >> between 300MB/s and some 1.2GB/s with an average of 500-600MB/s. Even > >> when using more clients in parallel, > >> the net throughput behaves the same so it seems to be a server-side > >> related bottleneck. > >> We do not see any remarkable CPU load. > > Sounds exactly like the usual NFS server/client writeback exclusion > > behaviour. i.e. while there is a commit being processed by the > > server, the client is not sending any new writes across the wire. > > hence you get the behaviour: > > > > client server > > send writes cache writes > > send commit fsync > > start writeback > > ...... > > finish writeback > > send commit response > > send writes cache writes > > send commit fsync > > start writeback > > ...... > > finish writeback > > send commit response > > > > and so you see binary throughput - either traffic comes across the > > wire, or the data is being written to disk. They don't happen at the > > same time. > > > > If it's not scaling with multiple clients, then that implies you > > don't have enough nfsd's configured to handle the incoming IO > > requests. This is a commmon enough NFS problem, you shoul dbe able > > to find tips from google dating back for years on how to tune your > > NFS setup to avoid these sorts of problems. > Ok, this explanation makes partially sense, on the other hand we are > speaking here about just 1.4GB/s which is pretty low > load for a Xeon E5 to process and the oscillation between 300-600MB/s is > even more low-end. > > And: why do we see exactly the same for read, not only for write ? A lack of NFSDs, or a too-small r/wsize, or not enough readahead on the client and/or server, etc. > I looks to me like there is some too large buffer somewhere on the way > which needs to be decreased as it is not needed at all. > I cannot recall seeing this earlier on 2.6.2x kernels, unfortunately I > cannot test that on new hardware. > > >> The interesting point is, we use btrfs filesystem on server instead of > >> xfs now (with otherwise same config) and we are getting consistent, > >> steady throughput > >> around 1.2-1.3GB/s. > > Different fsync implementation, or the btrfs configuration is > > ignoring commits (async export, by any chance?) > Well, there is no explicit async mount option. With btrfs write gives Sure - I'm talking about the server export option, not a client mount option. And - seriously - you need to check that btrfs is actually honouring commits correctly otherwise data integrity is compromised (the async export option makes the server ignore commits).... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs