On 05/28/2012 05:30 AM, tao.peng@xxxxxxx wrote: >> -----Original Message----- >> From: Myklebust, Trond [mailto:Trond.Myklebust@xxxxxxxxxx] <> >> Also, why do you consider it to be direct i/o specific? If the >> application is using byte range locking, and the locks aren't page/block >> aligned then you are in the same position of having to deal with partial >> page writes even in the read/write from page cache situation. > You are right about byte range locking + buffered IO, and it should > be fixed in pg_test with bellow patch and it could be a stable > candidate. What?? please explain. It sounds like you are saying that there is a very *serious* bug in current block-layout. >From my experiment I know that lots and lots of IO is done none-paged aligned even in buffered IO. Actually NFS goes to great length not to do the usual read-modify-write per page, but keeps the byte range that was written per page and only RPCs the exact offset-length of the modification. Because by definition NFS is byte aligned IO, not "blocks" or "sectors". Please explain what happens now. Is it a data corruption? Or just performance slowness. I don't understand. Don't you do the proper read-copy-modify-write that's mandated by block layout RFC? byte aligned? And what are sectors and PAGES got to do with it? I thought all IO must be "block" aligned. In objects-layout we have even worse alignment constrains with raid5 (stripe_size alignment). It was needed to do a (very simple BTW) read-modify-write. Involving not just partial pages but also full pages read. BTW we read into the page-cache the surrounding pages, so not to read them multiple times. > It is different from DIO case because for DIO we have to > be sure each page is blocksize aligned. And it can't easily be done > in pg_test because in pg_test we only have one nfs_page to test > against. > <snip> > diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c > index 7ae8a60..a84a0da 100644 > --- a/fs/nfs/blocklayout/blocklayout.c > +++ b/fs/nfs/blocklayout/blocklayout.c > @@ -925,6 +925,18 @@ nfs4_blk_get_deviceinfo(struct nfs_server *server, const struct nfs_fh *fh, > return rv; > } > > +static bool > +bl_pg_test(struct nfs_pageio_descriptor *pgio, struct nfs_page *prev, > + struct nfs_page *req) > +{ > + /* Bail out page unligned IO */ > + if (req->wb_offset || req->wb_pgbase || > + req->wb_bytes != PAGE_CACHE_SIZE) > + return false; > + This is very serious. Not many applications will currently pass this test. (And hence will not do direct IO) What happens today without this patch? > + return pnfs_generic_pg_test(pgio, prev, req); > +} > + <> Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html