Re: acls+kerberos (limitation)

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 20 Dec 2019 15:57:26 -0500

> On Dec 20, 2019, at 3:53 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
> 
> On Fri, Dec 20, 2019 at 3:11 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>> 
>>> On Dec 20, 2019, at 3:04 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>> 
>>> On Fri, Dec 20, 2019 at 1:28 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Dec 20, 2019, at 1:15 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>> 
>>>>> On Wed, Dec 18, 2019 at 2:34 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Dec 18, 2019, at 2:31 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>>>>> 
>>>>>>> On Wed, Dec 18, 2019 at 2:05 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>>>>>>>> 
>>>>>>>> On Wed, 2019-12-18 at 12:47 -0500, Olga Kornievskaia wrote:
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> Is this a well know but undocumented fact that you can't set large
>>>>>>>>> amount of acls (over 4096bytes, ~90acls) while mounted using
>>>>>>>>> krb5i/krb5p? That if you want to get/set large acls, it must be done
>>>>>>>>> over auth_sys/krb5?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> It's certainly not something that I was aware of. Do you see where that
>>>>>>>> limitation is coming from?
>>>>>>> 
>>>>>>> I haven't figure it exactly but gss_unwrap_resp_integ() is failing in
>>>>>>> if (mic_offset > rcv_buf->len). I'm just not sure who sets up the
>>>>>>> buffer (or why  rvc_buf->len is (4280) larger than a page can a
>>>>>>> page-limit might make sense to for me but it's not). So you think it
>>>>>>> should have been working.
>>>>>> 
>>>>>> The buffer is set up in the XDR encoder. But pages can be added
>>>>>> by the transport... I guess rcv_buf->len isn't updated when that
>>>>>> happens.
>>>>>> 
>>>>> 
>>>>> Here's why the acl+krbi/krb5p is failing.
>>>>> 
>>>>> acl tool first calls into the kernel to find out how large of a buffer
>>>>> it needs to supply and gets acl size then calls down again then code
>>>>> in __nfs4_get_acl_uncached() allocates a number of pages (this what
>>>>> set's the available buffer length later used by the sunrpc code). That
>>>>> works for non-integrity because in call_decode() the call
>>>>> rpc_unwrap_resp() doesn't try to calculate the checksum on the buffer
>>>>> that was just read. However, when its krb5i/krb5p we have truncated
>>>>> buffer and mic offset that's larger than the existing buffer.
>>>>> 
>>>>> I think something needs to be marked to skip doing gss for the initial
>>>>> acl query?  I first try providing more space in
>>>>> __nfs4_get_acl_uncached() for when authflavor=krb5i/krb5p and buflen=0
>>>>> but no matter what the number is the received acl can be larger than
>>>>> that thus I don't think that's a good approach.
>>>> 
>>>> It's not strictly true that the received ACL can be always be larger.
>>>> There is an upper bound on request sizes.
>>>> 
>>>> My preference has always been to allocate a receive buffer of the maximum
>>>> size before the call, just like every other request works. I can't think
>>>> of any reason why retrieving an ACL has to be different. Then we can get
>>>> rid of the hack in the transports to fill in those pages behind the back
>>>> of the upper layers.
>>>> 
>>>> The issue here has always been that there's no way for the client to
>>>> discover the number of bytes it needs to retrieve before it sets up the
>>>> GETACL.
>>>> 
>>>> For NFSv4.1+ you can probably assume that the ACL will never be larger
>>>> than the session's maximum reply size.
>>>> 
>>>> For NFSv4.0 you'll have to make something up.
>>>> 
>>>> But allocating a large receive buffer for this request is the only way to
>>>> make the receive reliable. You should be able to do that by stuffing the
>>>> recv XDR buffer with individual pages, just like nfsd does, in GETACL's
>>>> encoding function.
>>>> 
>>>> Others might have a different opinion. Or I might have completely
>>>> misunderstood the issue.
>>>> 
>>> 
>>> Putting a limit would be easier. I thought of using rsize (wsize) as
>>> we can't get anything larger than in the payload that but that's not
>>> possible. Because the code sets limits based on XATTR_MAX_SIZE which
>>> is a linux server side limitation and it doesn't seem to be
>>> appropriate to be applied as a generic implementation. Would it be ok
>>> to change the static memory allocation to be dynamic and based on the
>>> rsize? Thoughts?
>> 
>> Why is using the NFSv4.1 session max reply size not possible? For
>> NFSv4.0, rsize seems reasonable to me.
> 
> It's not possible because there is a hard limit of number of pages the
> code will allocate (right now).
> 
> static ssize_t __nfs4_get_acl_uncached(struct inode *inode, void *buf,
> size_t buflen)
> {
>        struct page *pages[NFS4ACL_MAXPAGES + 1] = {NULL, };
> 
> NFS4ACL_MAXPAGES are based on the 64K limit (from the XATTR_MAX_SIZE).
> 
>        if (npages > ARRAY_SIZE(pages))
>                return -ERANGE;
> 
> Typically session size (or r/wsizes) are something like 262K or 1M.
> 
> I was just saying that I'd then would need to remove the static
> structure for pages and make it dynamic based on the (rsize or session
> size).

IMO you should do that. There should be a page array available in
the recv XDR buffer.

> I thought that r/wsize was set to whatever the session sizes
> are so using the r/wsize values would make it work for both 4.0 and
> 4.1+.

<shrug> OK... that choice should be documented in a comment.

--
Chuck Lever