On 08/09/2012 05:30 AM, Peng Tao wrote: > On Thu, Aug 9, 2012 at 2:57 AM, Myklebust, Trond > <Trond.Myklebust@xxxxxxxxxx> wrote: >> On Wed, 2012-08-08 at 10:03 +0800, Peng Tao wrote: >>> From: Peng Tao <tao.peng@xxxxxxx> >>> >>> We don't have the real IO size information in buffer read, but for direct >>> read, we can get it from dreq->bytes_left. Let's make use of it. >> >> What's wrong with using the nfs_readpages arguments? The readahead code >> will tell you exactly how many pages it is trying to read. >> > The nr_pages information gets dropped before calling into pnfs code... > > How about something like bellow for buffer reads? (untested patch and > needs to integrate with DIO size for sure). > > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c > index 2e00fea..cec79c2 100644 > --- a/fs/nfs/pnfs.c > +++ b/fs/nfs/pnfs.c > @@ -1162,7 +1162,7 @@ pnfs_generic_pg_init_read(struct > nfs_pageio_descriptor *pgio, struct nfs_page *r > pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode, > req->wb_context, > req_offset(req), > - req->wb_bytes, > + (unsigned)pgio->pg_layout_private, > IOMODE_READ, > GFP_KERNEL); > /* If no lseg, fall back to read through mds */ > diff --git a/fs/nfs/read.c b/fs/nfs/read.c > index b6bdb18..64fb0e2 100644 > --- a/fs/nfs/read.c > +++ b/fs/nfs/read.c > @@ -649,6 +649,7 @@ int nfs_readpages(struct file *filp, struct > address_space *mapping, > > NFS_PROTO(inode)->read_pageio_init(&pgio, inode, > &nfs_async_read_completion_ops); > > + pgio.pg_layout_private = (void *)nr_pages; NACK!!! pgio.pg_layout_private is for use by LD only. Generic code should never touch it. If you actually need it and this info is realy lost please use a new member. > ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc); > > nfs_pageio_complete(&pgio); > > For reads, for both files and objects but I think also for blocks the optimum size is just offset-to-i_size. I do not see any resource constrained at server. If the extent list gets to big to be sent as one RPC for blocks layout, (as can be the case for objects when spanning groups) The server can just send the biggest chunk it wants, that fits. Or consider some other optimum max size according to internal topology knowledge. The server may always send a smaller layout as long as it includes "offset". If so the client will return for the reminder. I test this al the time it works perfectly. Actually for exofs regardless of requested size I always return a group_size full for read-layouts of the group that includes "offset". Only write-layouts need be sliced thin in exofs. Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html