Re: [RFC PATCH] fix krb5p mount not providing large enough buffer in rq_rcvsize

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Mar 10, 2020, at 7:56 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
> 
> 
> 
>> On Mar 10, 2020, at 5:07 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>> 
>> Hi Chuck,
>> 
>> On Tue, Mar 10, 2020 at 3:57 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>>> 
>>> Hi Olga-
>>> 
>>>> On Mar 10, 2020, at 2:58 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote:
>>>> 
>>>> Ever since commit 2c94b8eca1a26 "SUNRPC: Use au_rslack when computing
>>>> reply buffer size". It changed how "req->rq_rcvsize" is calculated. It
>>>> used to use au_cslack value which was nice and large and changed it to
>>>> au_rslack value which turns out to be too small.
>>>> 
>>>> Since 5.1, v3 mount with sec=krb5p fails against an Ontap server
>>>> because client's receive buffer it too small.
>>> 
>>> Can you be more specific? For instance, why is 100 bytes adequate for
>>> Linux servers, but not OnTAP?
>> 
>> I don't know why Ontap sends more data than Linux server.
> 
> Let's be sure we are fixing the right problem. Yes, au_rslack is
> smaller in v5.1, and that results in a behavioral regression. But
> exactly which part of the new calculation is incorrect is not yet
> clear. Simply bumping GSS_VERF_SLACK could very well plaster over
> the real problem.
> 
> 
>> The opaque_len is just a lot larger. For the first message Linux
>> opaque_len is 120bytes and Ontap it's 206. So it could be for instance
>> for FSINFO that sends the file handle, for Netapp the file handle is
>> 44bytes and for Linux it's only 28bytes.
> 
> The maximum filehandle size should already be accounted for in the
> maxsize macro for FSINFO.
> 
> Is this problem evident only with NFSv3 plus krb5p?
> 
> 
>>> Is this explanation for the current value not correct?
>>> 
>>> 51 /* length of a krb5 verifier (48), plus data added before arguments when
>>> 52  * using integrity (two 4-byte integers): */
>> 
>> I'm not sure what it is suppose to be. Isn't "data before arguments"
>> can vary in length and thus explain why linux and onto sizes are
>> different?
>> Looking at the network trace the krb5 verifier I see is 36bytes.
> 
> GSS_VERF_SLACK is only for the extra length added by GSS data. The
> length of the RPC message itself is handled separately, see above.
> 
> Can you post a Wireshark dissection of the problematic FSINFO reply?
> (Having a working reply from Linux and a failing reply from OnTAP
> would be even better).
> 
> 
>>>> For GSS, au_rslack is calculated from GSS_VERF_SLACK value which is
>>>> currently 100. And it's not enough. Changing it to 104 works and then
>>>> au_rslack is recalculated based on actual received mic.len and not
>>>> just the default buffer size.
> 
> What are the computed au_ralign and au_rslack values after the first
> successful operation?
> 
> 
>>>> I would like to propose to change it to something a little larger than
>>>> 104, like 120 to give room if some other server might reply with
>>>> something even larger.
>>> 
>>> Why does it need to be larger than 104?
>> 
>> I don't know why 100 was chosen and given that I think arguments are
>> taken into the account and arguments can change. I think NetApp has
>> changed their file handle sizes (at some point, not in the near past
>> but i think so?). Perhaps they might want to do that again so the size
>> will change again.
>> 
>> Honestly, I would have like for 100 to be 200 to be safe.
> 
> To be safe, I would like to have a good understanding of the details,
> rather than guessing at an arbitrary maximum value. Let's choose a
> rational maximum and include a descriptive comment about why that value
> is the best choice.

As author of 2c94b8eca1a26 I'm interested in helping resolve this
issue. I've audited this code again, and reviewed the git log.

Interestingly, this commit:

commit adeb8133dd57f380e70a389a89a2ea3ae227f9e2
Author:     Olga Kornievskaia <aglo@xxxxxxxxxxxxxx>
AuthorDate: Mon Dec 4 20:22:34 2006 -0500
Commit:     Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
CommitDate: Wed Dec 6 10:46:44 2006 -0500

    rpc: spkm3 update

changed GSS_VERF_SLACK from 56 to 100 without changing the documenting
comment or explaining the increase. That is the only change to
GSS_VERF_SLACK since 2006.

Also, au_rslack has always been set to GSS_VERF_SLACK. But I think
after 2c94b8eca1a26 ("SUNRPC: Use au_rslack when computing reply
buffer size"), GSS_VERF_SLACK is not the right symbolic constant to
use as an initial value of au_rslack.

Before that commit, rslack was not used to compute the receive buffer
slack, so the initial value was probably not interesting.

Since that commit, rslack is meant to be the size of GSS information
that _trails_ the RPC message payload. And ralign is intended to be
the size of the GSS information that _precedes_ that payload.


That doesn't address the problem of how to size the trailing GSS
information. I consulted RFC 2203 and 5403 hoping to find some
protocol-defined maximum for the size of the trailing GSS information
in integrity- and privacy-wrapped messages. Browsing through these
did not reveal any new wisdom (though I admit that I could have
misread these documents).

RFC 2203 Section 5.3.2.2 contains the structural definition of an
integrity-wrapped message:

      struct rpc_gss_integ_data {
          opaque databody_integ<>;
          opaque checksum<>;
      };

   The databody_integ field is created as follows.  A structure
   consisting of a sequence number followed by the procedure arguments
   is constructed. This is shown below as the type rpc_gss_data_t:

      struct rpc_gss_data_t {
          unsigned int seq_num;
          proc_req_arg_t arg;
      };

Note the use of empty angle brackets. These are variable-length opaques
with no pre-defined maximum size.

RFC 2203 Section 5.3.2.3 explains the construction of privacy-wrapped
messages:

   When data privacy is used, the request data is represented as
   follows:

      struct rpc_gss_priv_data {
          opaque databody_priv<>
      };

   The databody_priv field is created as follows.  The rpc_gss_data_t
   structure described earlier is constructed again in the same way as
   for the case of data integrity.  Next, the GSS_Wrap() call is invoked
   to encrypt the octet stream corresponding to the rpc_gss_data_t
   structure, using the same value for QOP (argument qop_req to
   GSS_Wrap()) as was used for the header checksum (in the verifier) and
   conf_req_flag (an argument to GSS_Wrap()) of TRUE.  The GSS_Wrap()
   call returns an opaque octet stream (representing the encrypted
   rpc_gss_data_t structure) and its length, and this is encoded as the
   databody_priv field. Since databody_priv has an XDR type of opaque,
   the length returned by GSS_Wrap() is encoded as the four octet
   length, followed by the encrypted octet stream (padded to a multiple
   of four octets).

And unfortunately this text is just as vague about the maximum size of
such messages. So:

x  The trailing information is not part of the RPC header verifier field,
so the use of GSS_VERF_SLACK is an arbitrary choice and not explanatory
as the initial setting for au_rslack. Pretty confusing, actually. I say
there should be a new symbolic constant defined for this value; maybe two:
one for integrity and one for privacy; but one large enough for either
is sufficient.

x  Also, there is no maximum size for these structures specified by the
protocol. Based on the RFCs, there is no way for the client to estimate
the initial reply size, and there is no way to sanity check the length
of the trailing GSS information in received GSS-wrapped messages, which
seems like a potential attack vector?

Thus GSS_VERF_SLACK is:
- a good symbolic initial value for au_ralign
- probably should be increased to RPC_MAX_AUTH_SIZE because that is the
largest size for that field according to RFC 5531 Section 8.2.
- probably should not be used as the initial value for au_rslack

Question is still: what maximum value is big enough to guarantee
interoperability? I've looked at libtirpc, no love there. Asking around
internally now.

I'm still interested in seeing what the Wireshark dissector says about
the failing OnTAP FSINFO response.


>>>> Thoughts? Will send an actual patch if no objections to this one.
>>>> 
>>>> diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
>>>> index 24ca861..44ae6bc 100644
>>>> --- a/net/sunrpc/auth_gss/auth_gss.c
>>>> +++ b/net/sunrpc/auth_gss/auth_gss.c
>>>> @@ -50,7 +50,7 @@
>>>> #define GSS_CRED_SLACK         (RPC_MAX_AUTH_SIZE * 2)
>>>> /* length of a krb5 verifier (48), plus data added before arguments when
>>>> * using integrity (two 4-byte integers): */
>>>> -#define GSS_VERF_SLACK         100
>>>> +#define GSS_VERF_SLACK         120
>>>> 
>>>> static DEFINE_HASHTABLE(gss_auth_hash_table, 4);
>>>> static DEFINE_SPINLOCK(gss_auth_hash_lock);
>>> 
>>> --
>>> Chuck Lever
> 
> --
> Chuck Lever

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux