RE: [PATCH 3/3] pnfsblock: bail out unaligned DIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx]
> Sent: Monday, May 28, 2012 12:39 AM
> To: Peng Tao
> Cc: linux-nfs@xxxxxxxxxxxxxxx; Peng, Tao
> Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO
> 
> On Sun, 2012-05-27 at 13:33 +0800, Peng Tao wrote:
> > Signed-off-by: Peng Tao <tao.peng@xxxxxxx>
> > ---
> >  fs/nfs/blocklayout/blocklayout.c |   20 ++++++++++++++++++++
> >  1 files changed, 20 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> > index 53cb450..cdb87a9 100644
> > --- a/fs/nfs/blocklayout/blocklayout.c
> > +++ b/fs/nfs/blocklayout/blocklayout.c
> > @@ -1000,7 +1000,27 @@ static bool bl_dio_begin(struct inode *inode, const struct iovec *iov,
> >  			 unsigned long nr_segs, loff_t pos,
> >  			 struct blk_plug *plug)
> >  {
> > +	unsigned blkmask = NFS_SERVER(inode)->pnfs_blksize - 1;
> > +	size_t count;
> > +	int seg;
> > +	unsigned long addr;
> > +
> >  	blk_start_plug(plug);
> > +
> > +	/* Only allow blksized DIO for now.
> > +	 * In theory we can handle page aligned DIO in current block layout
> > +	 * read/write code, but it would require serialization between
> > +	 * concurrent writers and it is far less effecient than just send IO
> > +	 * to MDS.
> > +	 */
> > +	if (pos & blkmask)
> > +		return false;
> > +	for (seg = 0; seg < nr_segs; seg++) {
> > +		addr = (unsigned long)iov[seg].iov_base;
> > +		count = iov[seg].iov_len;
> > +		if (unlikely((addr & blkmask) || (count & blkmask)))
> > +			return false;
> > +	}
> >  	return true;
> >  }
> 
> Again, this can and should go in the existing nfs_pageio_ops either in
> the pg_init or in the pg_test.
> 
As explain in the other mail, it is necessary to have pnfs_dio_begin/end, so I prefer to do the test as early as possible, which is in pnfs_dio_begin.

> Also, why do you consider it to be direct i/o specific? If the
> application is using byte range locking, and the locks aren't page/block
> aligned then you are in the same position of having to deal with partial
> page writes even in the read/write from page cache situation.
You are right about byte range locking + buffered IO, and it should be fixed in pg_test with bellow patch and it could be a stable candidate. It is different from DIO case because for DIO we have to be sure each page is blocksize aligned. And it can't easily be done in pg_test because in pg_test we only have one nfs_page to test against.

Thanks for reviewing.

Cheers,
Tao

>From 14a4d583fae8ee01c34fd17d923001aada1d426d Mon Sep 17 00:00:00 2001
From: Peng Tao <bergwolf@xxxxxxxxx>
Date: Fri, 25 May 2012 21:25:07 +0800
Subject: [PATCH] pnfsblock: bail out page unaligned IO

block layout driver cannot handle these IO.

Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Peng Tao <bergwolf@xxxxxxxxx>
---
 fs/nfs/blocklayout/blocklayout.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 7ae8a60..a84a0da 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh,
 	return rv;
 }
 
+static bool
+bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev,
+	   struct nfs_page *req)
+{
+	/* Bail out page unligned IO */
+	if (req->wb_offset || req->wb_pgbase ||
+	    req->wb_bytes != PAGE_CACHE_SIZE)
+		return false;
+
+	return pnfs_generic_pg_test(pgio, prev, req);
+}
+
 static int
 bl_set_layoutdriver(struct nfs_server *server, const struct nfs_fh *fh)
 {
@@ -998,13 +1010,13 @@ bl_clear_layoutdriver(struct nfs_server *server)
 
 static const struct nfs_pageio_ops bl_pg_read_ops = {
 	.pg_init = pnfs_generic_pg_init_read,
-	.pg_test = pnfs_generic_pg_test,
+	.pg_test = bl_pg_test,
 	.pg_doio = pnfs_generic_pg_readpages,
 };
 
 static const struct nfs_pageio_ops bl_pg_write_ops = {
 	.pg_init = pnfs_generic_pg_init_write,
-	.pg_test = pnfs_generic_pg_test,
+	.pg_test = bl_pg_test,
 	.pg_doio = pnfs_generic_pg_writepages,
 };
 
-- 
1.7.7.6
��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux