> -----Original Message----- > From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx] > Sent: Monday, May 28, 2012 12:39 AM > To: Peng Tao > Cc: linux-nfs@xxxxxxxxxxxxxxx; Peng, Tao > Subject: Re: [PATCH 3/3] pnfsblock: bail out unaligned DIO > > On Sun, 2012-05-27 at 13:33 +0800, Peng Tao wrote: > > Signed-off-by: Peng Tao <tao.peng@xxxxxxx> > > --- > > fs/nfs/blocklayout/blocklayout.c | 20 ++++++++++++++++++++ > > 1 files changed, 20 insertions(+), 0 deletions(-) > > > > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c > > index 53cb450..cdb87a9 100644 > > --- a/fs/nfs/blocklayout/blocklayout.c > > +++ b/fs/nfs/blocklayout/blocklayout.c > > @@ -1000,7 +1000,27 @@ static bool bl_dio_begin(struct inode *inode, const struct iovec *iov, > > unsigned long nr_segs, loff_t pos, > > struct blk_plug *plug) > > { > > + unsigned blkmask = NFS_SERVER(inode)->pnfs_blksize - 1; > > + size_t count; > > + int seg; > > + unsigned long addr; > > + > > blk_start_plug(plug); > > + > > + /* Only allow blksized DIO for now. > > + * In theory we can handle page aligned DIO in current block layout > > + * read/write code, but it would require serialization between > > + * concurrent writers and it is far less effecient than just send IO > > + * to MDS. > > + */ > > + if (pos & blkmask) > > + return false; > > + for (seg = 0; seg < nr_segs; seg++) { > > + addr = (unsigned long)iov[seg].iov_base; > > + count = iov[seg].iov_len; > > + if (unlikely((addr & blkmask) || (count & blkmask))) > > + return false; > > + } > > return true; > > } > > Again, this can and should go in the existing nfs_pageio_ops either in > the pg_init or in the pg_test. > As explain in the other mail, it is necessary to have pnfs_dio_begin/end, so I prefer to do the test as early as possible, which is in pnfs_dio_begin. > Also, why do you consider it to be direct i/o specific? If the > application is using byte range locking, and the locks aren't page/block > aligned then you are in the same position of having to deal with partial > page writes even in the read/write from page cache situation. You are right about byte range locking + buffered IO, and it should be fixed in pg_test with bellow patch and it could be a stable candidate. It is different from DIO case because for DIO we have to be sure each page is blocksize aligned. And it can't easily be done in pg_test because in pg_test we only have one nfs_page to test against. Thanks for reviewing. Cheers, Tao >From 14a4d583fae8ee01c34fd17d923001aada1d426d Mon Sep 17 00:00:00 2001 From: Peng Tao <bergwolf@xxxxxxxxx> Date: Fri, 25 May 2012 21:25:07 +0800 Subject: [PATCH] pnfsblock: bail out page unaligned IO block layout driver cannot handle these IO. Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Peng Tao <bergwolf@xxxxxxxxx> --- fs/nfs/blocklayout/blocklayout.c | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c index 7ae8a60..a84a0da 100644 --- a/fs/nfs/blocklayout/blocklayout.c +++ b/fs/nfs/blocklayout/blocklayout.c @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh, return rv; } +static bool +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, + struct nfs_page *req) +{ + /* Bail out page unligned IO */ + if (req->wb_offset || req->wb_pgbase || + req->wb_bytes != PAGE_CACHE_SIZE) + return false; + + return pnfs_generic_pg_test(pgio, prev, req); +} + static int bl_set_layoutdriver(struct nfs_server *server, const struct nfs_fh *fh) { @@ -998,13 +1010,13 @@ bl_clear_layoutdriver(struct nfs_server *server) static const struct nfs_pageio_ops bl_pg_read_ops = { .pg_init = pnfs_generic_pg_init_read, - .pg_test = pnfs_generic_pg_test, + .pg_test = bl_pg_test, .pg_doio = pnfs_generic_pg_readpages, }; static const struct nfs_pageio_ops bl_pg_write_ops = { .pg_init = pnfs_generic_pg_init_write, - .pg_test = pnfs_generic_pg_test, + .pg_test = bl_pg_test, .pg_doio = pnfs_generic_pg_writepages, }; -- 1.7.7.6 ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥