On Tue, 2011-04-12 at 09:17 -0700, Badari Pulavarty wrote: > On Tue, 2011-04-12 at 11:49 -0400, Trond Myklebust wrote: > > On Tue, 2011-04-12 at 08:32 -0700, Badari Pulavarty wrote: > > > Hi, > > > > > > We recently ran into serious performance issue with NFS client. > > > It turned out that its due to lack of readv/write support for > > > NFS (O_DIRECT) client. > > > > > > Here is our use-case: > > > > > > In our cloud environment, our storage is over NFS. Files > > > on NFS are passed as a blockdevices to the guest (using > > > O_DIRECT). When guest is doing IO on these block devices, > > > they will end up as O_DIRECT writes to NFS (on KVM host). > > > > > > QEMU (on the host) gets a vector from virtio-ring and > > > submits them. Old versions of QEMU, linearized the vector > > > it got from KVM (copied them into a buffer) and submits > > > the buffer. So, NFS client always received a single buffer. > > > > > > Later versions of QEMU, eliminated this copy and submits > > > a vector directly using preadv/pwritev(). > > > > > > NFS client loops through the vector and submits each > > > vector as separate request for each IO < wsize. In our > > > case (negotiated wsize=1MB), for 256K IO - we get 64 > > > vectors, each 4K. So, we end up submitting 64 4K FILE_SYNC IOs. > > > Server end up doing each 4K synchronously. This causes > > > serious performance degrade. We are trying to see if the > > > performance improves if we convert IOs to ASYNC - but > > > our initial results doesn't look good. > > > > > > readv/writev support NFS client for all possible cases is > > > hard. Instead, if all vectors are page-aligned and > > > iosizes page-multiple - it fits the current code easily. > > > Luckily, QEMU use-case fits these requirements. > > > > > > Here is the patch to add this support. Comments ? > > > > Your approach goes in the direction of further special-casing O_DIRECT > > in the NFS client. I'd like to move away from that and towards > > integration with the ordinary read/write codepaths so that aside from > > adding request coalescing, we can also enable pNFS support. > > > > I completely agree. But its a major under-taking :( Sure, but it is one that I'm working on. I'm just explaining why I'd prefer not to include more stop-gap O_DIRECT patches at this point. We can afford to wait for one more release cycle if it means fixing O_DIRECT once and for all. Cheers, Trond -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html