Re: [PATCH 6/6] NFSv4: allow getacl rpc to allocate pages on demand

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 20 Feb 2017 11:42:31 -0500

> On Feb 20, 2017, at 11:09 AM, J. Bruce Fields <bfields@xxxxxxxxxx> wrote:
> 
> On Sun, Feb 19, 2017 at 02:29:03PM -0500, Chuck Lever wrote:
>> 
>>> On Feb 18, 2017, at 9:07 PM, J. Bruce Fields <bfields@xxxxxxxxxx> wrote:
>>> 
>>> From: Weston Andros Adamson <dros@xxxxxxxxxx>
>>> 
>>> Instead of preallocating pags, allow xdr_partial_copy_from_skb() to
>>> allocate whatever pages we need on demand.  This is what the NFSv3 ACL
>>> code does.
>> 
>> The patch description does not explain why this change is
>> being done.
> 
> The only justification I see is avoiding allocating pages unnecessarily.

That makes sense. Is there a real world workload that has seen
a negative effect?

> Without this patch, for each getacl, we allocate 17 pages (if I'm
> calculating correctly) and probably rarely use most of them.
> 
> In the v3 case I think it's 7 pages instead of 17.

I would have guessed 9. Out of curiosity, is there a reason
documented for these size limits?

> Do we have reason to believe that's actually a big deal?

The xprtrdma hack already has to allocate the full set of
pages for NFSACL GETACL.

If NFSv4 GETATTR(fs_acl4) already works this way and there
are no real problems, I can't see any issue with NFSACL GETACL
using the same mechanism to retrieve smaller objects.

The only risk to overallocating is that it could drive some
page reclaims. The upper layer should be in a better position
to prevent deadlock in this case than the transport layer is.
However if NFSv4 doesn't see a problem here, then there isn't
likely to be an issue for NFSACL GETACL, IMO.

> --b.
> 
>> The matching hack in xprtrdma is in rpcrdma_convert_iovs().
>> Note that those are GFP_ATOMIC allocations, whereas here
>> they are GFP_KERNEL, and are thus more reliable.
>> 
>> IMO this is a step in the wrong direction. We should not be
>> adding more upper layer dependencies on memory allocation
>> in the transport layer.
>> 
>> I strongly prefer that rather the NFSACL code works the way
>> this code currently does, and that the hacks be removed from
>> the transport implementations.
>> 
>> 
>>> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
>>> ---
>>> fs/nfs/nfs4proc.c | 23 +++++++----------------
>>> 1 file changed, 7 insertions(+), 16 deletions(-)
>>> 
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 3e3dbba4aa74..7842c73fddfc 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -5068,6 +5068,7 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>>> 	struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
>>> 	struct nfs_getaclargs args = {
>>> 		.fh = NFS_FH(inode),
>>> +		/* The xdr layer may allocate pages here. */
>> 
>> Sure, it is called xdr_partial_copy_from_skb, but that function
>> lives in socklib.c and is invoked only from xprtsock.c. Also, a
>> similar hack has to be done in xprtrdma.
>> 
>> So IMO this is a transport layer hack, and not part of the
>> (generic) XDR layer.
>> 
>> 
>>> 		.acl_pages = pages,
>>> 	};
>>> 	struct nfs_getaclres res = {
>>> @@ -5079,32 +5080,22 @@ static ssize_t nfs4_do_get_acl(struct inode *inode, void *buf, size_t buflen)
>>> 		.rpc_argp = &args,
>>> 		.rpc_resp = &res,
>>> 	};
>>> -	unsigned int npages = DIV_ROUND_UP(buflen, PAGE_SIZE) + 1;
>>> -	int ret = -ENOMEM, i;
>>> -
>>> -	if (npages > ARRAY_SIZE(pages))
>>> -		return -ERANGE;
>>> -
>>> -	for (i = 0; i < npages; i++) {
>>> -		pages[i] = alloc_page(GFP_KERNEL);
>>> -		if (!pages[i])
>>> -			goto out_free;
>>> -	}
>>> +	int ret, i;
>>> 
>>> 	/* for decoding across pages */
>>> 	res.acl_scratch = alloc_page(GFP_KERNEL);
>>> 	if (!res.acl_scratch)
>>> -		goto out_free;
>>> +		return -ENOMEM;
>>> 
>>> -	args.acl_len = npages * PAGE_SIZE;
>>> +	args.acl_len = ARRAY_SIZE(pages) << PAGE_SHIFT;
>>> 
>>> -	dprintk("%s  buf %p buflen %zu npages %d args.acl_len %zu\n",
>>> -		__func__, buf, buflen, npages, args.acl_len);
>>> +	dprintk("%s  buf %p buflen %zu args.acl_len %zu\n",
>>> +		__func__, buf, buflen, args.acl_len);
>>> 	ret = nfs4_call_sync(NFS_SERVER(inode)->client, NFS_SERVER(inode),
>>> 			     &msg, &args.seq_args, &res.seq_res, 0);
>>> 	if (ret == 0)
>>> 		ret = res.acl_len;
>>> -out_free:
>>> +
>>> 	for (i = 0; i < ARRAY_SIZE(pages) && pages[i]; i++)
>>> 		__free_page(pages[i]);
>>> 	__free_page(res.acl_scratch);
>>> -- 
>>> 2.9.3
>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html