Re: [PATCH] sunrpc: fix refcount leak for rpc auth modules

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Excellent, Daniel. Thanks for following up! I will add a Link: tag
to this thread in the patch description.


> On Mar 2, 2021, at 6:50 AM, Daniel Kobras <kobras@xxxxxxxxxxxxx> wrote:
> 
> Hi all!
> 
> Am 01.03.21 um 18:44 schrieb Chuck Lever:
>>> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
>>> 
>>> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote:
>>>> 
>>>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@xxxxxxxxxxxxx> wrote:
>>>>> 
>>>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common()
>>>>> enters a call path that does not call svc_authorise() before leaving the
>>>>> function, and thus leaks a reference on the auth module's refcount. Hence,
>>>>> make sure calls to svc_authenticate() and svc_authorise() are paired for
>>>>> all call paths, to make sure rpc auth modules can be unloaded.
>>>>> 
>>>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message")
>>>>> Signed-off-by: Daniel Kobras <kobras@xxxxxxxxxxxxx>
>>>>> ---
>>>>> Hi!
>>>>> 
>>>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed
>>>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its
>>>>> consumers already unloaded. I wasn't able to analyze any further on the live
>>>>> system, but had a look at the code afterwards, and found a path that seems
>>>>> to leak references if the mechanism's accept() op shuts down a connection
>>>>> early. Although I couldn't verify, this seem to be a plausible fix.
>>>>> 
>>>>> Kind regards,
>>>>> 
>>>>> Daniel
>>>> 
>>>> Hi Daniel-
>>>> 
>>>> I've provisionally included your patch in my NFSD for-rc topic branch
>>>> here:
>>>> 
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>>> 
>>>> Your bug report seems plausible, but I need to take a closer look at that
>>>> code and your proposed change. Would very much like to hear from others,
>>>> too.
>>> 
>>> So, the effect of this is to call svc_authorise more often.  I think
>>> that's always safe, because svc_authorise is a no-op unless rq_authops
>>> is set, it clears rq_authops itself, and rq_authops being set is a
>>> guarantee that ->accept() already ran.
>>> 
>>> It's harder to know if this solves the problem, as I see a lot of other
>>> mentions of THIS_MODULE in svcauth_gss.c.
>> 
>> Perhaps a deeper audit is necessary.
>> 
>> A small code change to inject SVC_CLOSE returns at random would enable
>> a more dynamic analysis.
> 
> I've managed to come up with simple reproducer for this bug:
> 
> On a working krb5 NFS mount from a test client, check which enctype is
> used in the ticket for the NFS service. Then unmount, and exclude this
> enctype from permitted_enctypes in the server's /etc/krb5.conf.[*]
> Trying to mount again from the test client should now fail (EPERM), and
> each mount attempt increases the refcount of the server's auth_rpcgss
> module (by 22 in my test).
> 
> Exchanging sunrpc.ko for a version with the patch applied, and
> re-running the same test, the refcount remains constant instead. This
> confirms the initial analysis, and indicates the fix is actually correct.
> 
> [*] For a quick test in a standard setup, I've used
>      [libdefaults]
>      permitted_enctypes = aes128-cts-hmac-sha1-96
>      (...)
>    to make the normal AES256 tickets fail. A more realistic scenario
>    would be a client that's restricted to RC4, and a server with
>    RC4 keys on the KDC, but only AES allowed in krb5.conf. Default
>    behaviour of typical AD join tools makes it easy to end up in a
>    situation like this.
> 
>>> Possibly orthogonal to this problem, but: svcauth_gss_release
>>> unconditionally dereferences rqstp->rq_auth_data.  Isn't that a NULL
>>> dereference if the kmalloc at the start of svcauth_gss_accept() fails?
>>> 
>>> Finally, should we care about module reference leaks?
>> 
>> I would prefer that module reference counting work as expected. When it
>> doesn't that tends to lead to people (say, me) hunting for bugs that
>> might actually be serious.
> 
> The refcount leak is the easily visible consequence, but the skipped
> call to svcauth_gss_release() probably also leaks a small amount of
> memory for each request. Repeating the test case above for a longer
> period of time (eg. by throwing an automounter into the mix), this might
> eventually become noticeable.
> 
>>> Does anyone really *need* to unload modules?
>> 
>> Anyone who wants to replace the module with a newer build that fixes a
>> bug. It avoids a full reboot, and for some that's important.
> 
> Switching from rpc.svcgssd to gssproxy without taking down the machine
> as a whole was the situation that originally prompted me to look into
> this, but I admit that's a rather rare use case.
> 
>>> And will bad stuff happen when the
>>> count overflows, or does the module code fail safely somehow in the
>>> overflow case?  I know, bugs are bugs, I should care about fixing all of
>>> them, shame on me....
>>> 
>>> --b.
>>> 
>>>> 
>>>> 
>>>>> net/sunrpc/svc.c | 6 ++++--
>>>>> 1 file changed, 4 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
>>>>> index 61fb8a18552c..d76dc9d95d16 100644
>>>>> --- a/net/sunrpc/svc.c
>>>>> +++ b/net/sunrpc/svc.c
>>>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> 
>>>>> sendit:
>>>>> 	if (svc_authorise(rqstp))
>>>>> -		goto close;
>>>>> +		goto close_xprt;
>>>>> 	return 1;		/* Caller can now send it */
>>>>> 
>>>>> release_dropit:
>>>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> 	return 0;
>>>>> 
>>>>> close:
>>>>> +	svc_authorise(rqstp);
>>>>> +close_xprt:
>>>>> 	if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags))
>>>>> 		svc_close_xprt(rqstp->rq_xprt);
>>>>> 	dprintk("svc: svc_process close\n");
>>>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv)
>>>>> err_short_len:
>>>>> 	svc_printk(rqstp, "short len %zd, dropping request\n",
>>>>> 			argv->iov_len);
>>>>> -	goto close;
>>>>> +	goto close_xprt;
>>>>> 
>>>>> err_bad_rpc:
>>>>> 	serv->sv_stats->rpcbadfmt++;
>>>>> -- 
>>>>> 2.25.1
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Puzzle ITC Deutschland GmbH
>>>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
>>>>> Tübingen
>>>>> 
>>>>> Eingetragen am Amtsgericht Stuttgart HRB 765802
>>>>> Geschäftsführer: 
>>>>> Lukas Kallies, Daniel Kobras, Mark Pröhl
>>>>> 
>>>> 
>>>> --
>>>> Chuck Lever
>> 
>> --
>> Chuck Lever
> 
> Kind regards,
> 
> Daniel
> -- 
> Daniel Kobras
> Principal Architect
> Puzzle ITC Deutschland
> +49 7071 14316 0
> www.puzzle-itc.de
> 
> -- 
> Puzzle ITC Deutschland GmbH
> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 
> Tübingen
> 
> Eingetragen am Amtsgericht Stuttgart HRB 765802
> Geschäftsführer: 
> Lukas Kallies, Daniel Kobras, Mark Pröhl

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux