Hi all! Am 01.03.21 um 18:44 schrieb Chuck Lever: >> On Mar 1, 2021, at 11:28 AM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >> >> On Mon, Mar 01, 2021 at 03:20:24PM +0000, Chuck Lever wrote: >>> >>>> On Feb 26, 2021, at 6:04 PM, Daniel Kobras <kobras@xxxxxxxxxxxxx> wrote: >>>> >>>> If an auth module's accept op returns SVC_CLOSE, svc_process_common() >>>> enters a call path that does not call svc_authorise() before leaving the >>>> function, and thus leaks a reference on the auth module's refcount. Hence, >>>> make sure calls to svc_authenticate() and svc_authorise() are paired for >>>> all call paths, to make sure rpc auth modules can be unloaded. >>>> >>>> Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") >>>> Signed-off-by: Daniel Kobras <kobras@xxxxxxxxxxxxx> >>>> --- >>>> Hi! >>>> >>>> While debugging NFS on a system with misconfigured krb5 settings, we noticed >>>> a suspiciously high refcount on the auth_rpcgss module, despite all of its >>>> consumers already unloaded. I wasn't able to analyze any further on the live >>>> system, but had a look at the code afterwards, and found a path that seems >>>> to leak references if the mechanism's accept() op shuts down a connection >>>> early. Although I couldn't verify, this seem to be a plausible fix. >>>> >>>> Kind regards, >>>> >>>> Daniel >>> >>> Hi Daniel- >>> >>> I've provisionally included your patch in my NFSD for-rc topic branch >>> here: >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git >>> >>> Your bug report seems plausible, but I need to take a closer look at that >>> code and your proposed change. Would very much like to hear from others, >>> too. >> >> So, the effect of this is to call svc_authorise more often. I think >> that's always safe, because svc_authorise is a no-op unless rq_authops >> is set, it clears rq_authops itself, and rq_authops being set is a >> guarantee that ->accept() already ran. >> >> It's harder to know if this solves the problem, as I see a lot of other >> mentions of THIS_MODULE in svcauth_gss.c. > > Perhaps a deeper audit is necessary. > > A small code change to inject SVC_CLOSE returns at random would enable > a more dynamic analysis. I've managed to come up with simple reproducer for this bug: On a working krb5 NFS mount from a test client, check which enctype is used in the ticket for the NFS service. Then unmount, and exclude this enctype from permitted_enctypes in the server's /etc/krb5.conf.[*] Trying to mount again from the test client should now fail (EPERM), and each mount attempt increases the refcount of the server's auth_rpcgss module (by 22 in my test). Exchanging sunrpc.ko for a version with the patch applied, and re-running the same test, the refcount remains constant instead. This confirms the initial analysis, and indicates the fix is actually correct. [*] For a quick test in a standard setup, I've used [libdefaults] permitted_enctypes = aes128-cts-hmac-sha1-96 (...) to make the normal AES256 tickets fail. A more realistic scenario would be a client that's restricted to RC4, and a server with RC4 keys on the KDC, but only AES allowed in krb5.conf. Default behaviour of typical AD join tools makes it easy to end up in a situation like this. >> Possibly orthogonal to this problem, but: svcauth_gss_release >> unconditionally dereferences rqstp->rq_auth_data. Isn't that a NULL >> dereference if the kmalloc at the start of svcauth_gss_accept() fails? >> >> Finally, should we care about module reference leaks? > > I would prefer that module reference counting work as expected. When it > doesn't that tends to lead to people (say, me) hunting for bugs that > might actually be serious. The refcount leak is the easily visible consequence, but the skipped call to svcauth_gss_release() probably also leaks a small amount of memory for each request. Repeating the test case above for a longer period of time (eg. by throwing an automounter into the mix), this might eventually become noticeable. >> Does anyone really *need* to unload modules? > > Anyone who wants to replace the module with a newer build that fixes a > bug. It avoids a full reboot, and for some that's important. Switching from rpc.svcgssd to gssproxy without taking down the machine as a whole was the situation that originally prompted me to look into this, but I admit that's a rather rare use case. >> And will bad stuff happen when the >> count overflows, or does the module code fail safely somehow in the >> overflow case? I know, bugs are bugs, I should care about fixing all of >> them, shame on me.... >> >> --b. >> >>> >>> >>>> net/sunrpc/svc.c | 6 ++++-- >>>> 1 file changed, 4 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c >>>> index 61fb8a18552c..d76dc9d95d16 100644 >>>> --- a/net/sunrpc/svc.c >>>> +++ b/net/sunrpc/svc.c >>>> @@ -1413,7 +1413,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>> >>>> sendit: >>>> if (svc_authorise(rqstp)) >>>> - goto close; >>>> + goto close_xprt; >>>> return 1; /* Caller can now send it */ >>>> >>>> release_dropit: >>>> @@ -1425,6 +1425,8 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>> return 0; >>>> >>>> close: >>>> + svc_authorise(rqstp); >>>> +close_xprt: >>>> if (rqstp->rq_xprt && test_bit(XPT_TEMP, &rqstp->rq_xprt->xpt_flags)) >>>> svc_close_xprt(rqstp->rq_xprt); >>>> dprintk("svc: svc_process close\n"); >>>> @@ -1433,7 +1435,7 @@ svc_process_common(struct svc_rqst *rqstp, struct kvec *argv, struct kvec *resv) >>>> err_short_len: >>>> svc_printk(rqstp, "short len %zd, dropping request\n", >>>> argv->iov_len); >>>> - goto close; >>>> + goto close_xprt; >>>> >>>> err_bad_rpc: >>>> serv->sv_stats->rpcbadfmt++; >>>> -- >>>> 2.25.1 >>>> >>>> >>>> -- >>>> Puzzle ITC Deutschland GmbH >>>> Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 >>>> Tübingen >>>> >>>> Eingetragen am Amtsgericht Stuttgart HRB 765802 >>>> Geschäftsführer: >>>> Lukas Kallies, Daniel Kobras, Mark Pröhl >>>> >>> >>> -- >>> Chuck Lever > > -- > Chuck Lever Kind regards, Daniel -- Daniel Kobras Principal Architect Puzzle ITC Deutschland +49 7071 14316 0 www.puzzle-itc.de -- Puzzle ITC Deutschland GmbH Sitz der Gesellschaft: Eisenbahnstraße 1, 72072 Tübingen Eingetragen am Amtsgericht Stuttgart HRB 765802 Geschäftsführer: Lukas Kallies, Daniel Kobras, Mark Pröhl