On Wed, 2012-04-18 at 07:18 -0700, Alexandre Depoutovitch wrote: > NFS daemons always perform buffered IO on files. As a result, write > requests that are not aligned on a file system block boundary take about > 15 times more time to complete compared to the same writes that are file > system block aligned. This patch fixes the problem by analyzing alignment > of the IO request that comes to NFS daemon and using Direct I/O mechanism > when all of the following are true: > 1. Request is not aligned on a file system block boundary > 2. Request is aligned on an underlying block device’s sector boundary. > 3. Request size is a multiple of the sector size. > In all other cases, buffered IO is performed as has been done before. > > After applying a patch, the resulting performance of all types of > requests, except unaligned writes remains the same, while performance of > unaligned writes improves 15 times. > A new flag is exposed to users through /proc/fs/nfsd/direct_io node. The > default value of 1 results in the above behavior. Writing 0 to the node > turns off the optimization, and forces NFS daemon to always use buffered > IO (as it has done before). Writing 2 to the node tells NFS daemon to use > direct I/O even if request is file system block aligned. > > I have tested this patch by running concurrent NFS writes to an exported > file system and verifying locally that writes reached the disk. > <snip> > +/* > + Performs direct I/O for a given NFS write request > +*/ > +static ssize_t nfsd_vfs_write_direct(struct file *file, const struct > iovec *vec, > + unsigned long vlen, loff_t *pos) { > + ssize_t result = -EINVAL; > + unsigned int page_num; > + struct iovec *aligned_vec = NULL; > + > + // Check size to be multiple of sectors > + size_t size = iov_length(vec, vlen); > + > + if (size == 0) > + return vfs_writev(file, (struct iovec > __user *)vec, vlen, pos); > + > + // Allocate necesary number of pages > + result = nfsd_allocate_paged_iovec(size, &page_num, > &aligned_vec); > + if (result) { > + printk(KERN_WARNING"Cannot allocate > aligned_vec."); > + goto out; > + } > + > + // Copy data > + result = nfsd_copy_iovec(vec, vlen, aligned_vec, page_num, > size); > + if(result) { > + printk(KERN_WARNING"Wrong amount of data > copied to aligned buffer."); > + goto out; > + } > + > + // Call further > + result = vfs_writev(file, (struct iovec __user > *)aligned_vec, page_num, pos); > + > +out: > + nfsd_free_paged_iovec(page_num, aligned_vec); > + return result; > +} > + > + Can this be rewritten to use Dave Kleikamp's iov_iter interface with asynchronous reads and writes? Otherwise I can't see how it is going to avoid being mindnumbingly slow. You can see the LWN description and link to his patches in http://lwn.net/Articles/490114/ -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥