On Sun, 2012-05-27 at 13:33 +0800, Peng Tao wrote: > Signed-off-by: Peng Tao <tao.peng@xxxxxxx> > --- > fs/nfs/blocklayout/blocklayout.c | 20 ++++++++++++++++++++ > 1 files changed, 20 insertions(+), 0 deletions(-) > > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c > index 53cb450..cdb87a9 100644 > --- a/fs/nfs/blocklayout/blocklayout.c > +++ b/fs/nfs/blocklayout/blocklayout.c > @@ -1000,7 +1000,27 @@ static bool bl_dio_begin(struct inode *inode, const struct iovec *iov, > unsigned long nr_segs, loff_t pos, > struct blk_plug *plug) > { > + unsigned blkmask = NFS_SERVER(inode)->pnfs_blksize - 1; > + size_t count; > + int seg; > + unsigned long addr; > + > blk_start_plug(plug); > + > + /* Only allow blksized DIO for now. > + * In theory we can handle page aligned DIO in current block layout > + * read/write code, but it would require serialization between > + * concurrent writers and it is far less effecient than just send IO > + * to MDS. > + */ > + if (pos & blkmask) > + return false; > + for (seg = 0; seg < nr_segs; seg++) { > + addr = (unsigned long)iov[seg].iov_base; > + count = iov[seg].iov_len; > + if (unlikely((addr & blkmask) || (count & blkmask))) > + return false; > + } > return true; > } Again, this can and should go in the existing nfs_pageio_ops either in the pg_init or in the pg_test. Also, why do you consider it to be direct i/o specific? If the application is using byte range locking, and the locks aren't page/block aligned then you are in the same position of having to deal with partial page writes even in the read/write from page cache situation. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥