Re: [PATCH RFC] Performing direct I/O on sector-aligned requests

"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> · Wed, 18 Apr 2012 16:19:40 +0000

On Wed, 2012-04-18 at 07:18 -0700, Alexandre Depoutovitch wrote:
> NFS daemons always perform buffered IO on files. As a result, write
> requests that are not aligned on a file system block boundary take about
> 15 times more time to complete compared to the same writes that are file
> system block aligned. This patch fixes the problem by analyzing alignment
> of the IO request that comes to NFS daemon and using Direct I/O mechanism
> when all of the following are true:
> 1. Request is not aligned on a file system block boundary
> 2. Request is aligned on an underlying block device’s sector boundary.
> 3. Request size is a multiple of the sector size.
> In all other cases, buffered IO is performed as has been done before.
> 
> After applying a patch, the resulting performance of all types of
> requests, except unaligned writes remains the same, while performance of
> unaligned writes improves 15 times.
> A new flag is exposed to users through /proc/fs/nfsd/direct_io node. The
> default value of 1 results in the above behavior. Writing 0 to the node
> turns off the optimization, and forces NFS daemon to always use buffered
> IO (as it has done before). Writing 2 to the node tells NFS daemon to use
> direct I/O even if request is file system block aligned.
> 
> I have tested this patch by running concurrent NFS writes to an exported
>  file system and verifying locally that writes reached the disk.
> 
<snip>
> +/*
> + Performs direct I/O for a given NFS write request
> +*/
> +static ssize_t nfsd_vfs_write_direct(struct file *file, const struct
> iovec *vec,
> +                                 unsigned long vlen, loff_t *pos) {
> +              ssize_t result = -EINVAL;
> +              unsigned int page_num;
> +              struct iovec *aligned_vec = NULL;
> +             
> +              // Check size to be multiple of sectors
> +              size_t size = iov_length(vec, vlen);
> +
> +              if (size == 0)
> +                              return vfs_writev(file, (struct iovec
> __user *)vec, vlen, pos);
> +
> +              // Allocate necesary number of pages
> +              result = nfsd_allocate_paged_iovec(size, &page_num,
> &aligned_vec);
> +              if (result) {
> +                              printk(KERN_WARNING"Cannot allocate
> aligned_vec.");
> +                              goto out;
> +              }
> +
> +              // Copy data
> +              result = nfsd_copy_iovec(vec, vlen, aligned_vec, page_num,
> size);
> +              if(result) {
> +                              printk(KERN_WARNING"Wrong amount of data
> copied to aligned buffer.");
> +                              goto out;
> +              }
> +
> +              // Call further
> +              result = vfs_writev(file, (struct iovec __user
> *)aligned_vec, page_num, pos);
> +
> +out:
> +              nfsd_free_paged_iovec(page_num, aligned_vec);
> +              return result;
> +}
> +
> +

Can this be rewritten to use Dave Kleikamp's iov_iter interface with
asynchronous reads and writes? Otherwise I can't see how it is going to
avoid being mindnumbingly slow.

You can see the LWN description and link to his patches in
http://lwn.net/Articles/490114/

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥