On Tue, 2018-10-30 at 21:58 +0800, zhong jiang wrote: > On 2018/10/30 21:06, Benjamin Coddington wrote: > > Hi zhong jiang, > > > > Try asking in linux-nfs.. but I'll also note that 3.10-stable may > > be missing a number of fixes to leaks in the NFS GSS code. > > > > I can see a more than a few fixes to memory leaks with: > > git log --grep=leak --oneline net/sunrpc/auth_gss/ > > > > Thanks for your reply. I has tested some of them in the upsteam as > you have said. but It fails to solve the issue completely. > hence, I turn to the relevant experts whether they have happened to > the issue or can give some suggestion or not. > > Thanks, > zhong jiang > > Ben > > > > On 30 Oct 2018, at 8:45, zhong jiang wrote: > > > > > Hi, Herbert > > > > > > Recently, I hit a memory leak issue when mounting and > > > unmounting nfs with the way of krb5. > > > The issue happens to the linux-3.10-stable. > > > > > > I find that slab-1024 and slab-512 will take up most of the > > > memory. And it can not be freed. > > > Meanwhile, it result in rpcsec_gss_krb5 can be unregistered as > > > well. > > > > > > Are you running the latest 3.10-stable? This sounds very familiar to something I encountered a while ago and it was a sunrpc cache related problem. The patch that fixed it for me is in 3.10.106 though. Can you check if this cache is growing indefinitely? /proc/net/rpc/auth.rpcsec.context If it is large, try to flush explicitly with: date +%s > /proc/net/rpc/auth.rpcsec.context/flush If all that checks out, you may need the below upstream fix, but it went into v3.10.106 as 6a4a5fd svcrpc: don't leak contexts on PROC_DESTROY commit 6a4a5fd4c7bc6a06ca26ad7327d046d8d3c0932a Author: J. Bruce Fields <bfields@xxxxxxxxxx> Date: Mon Jan 9 17:15:18 2017 -0500 svcrpc: don't leak contexts on PROC_DESTROY commit 78794d1890708cf94e3961261e52dcec2cc34722 upstream. Context expiry times are in units of seconds since boot, not unix time. The use of get_seconds() here therefore sets the expiry time decades in the future. This prevents timely freeing of contexts destroyed by client RPC_GSS_PROC_DESTROY requests. We'd still free them eventually (when the module is unloaded or the container shut down), but a lot of contexts could pile up before then. Fixes: c5b29f885afe "sunrpc: use seconds since boot in expiry cache" Reported-by: Andy Adamson <andros@xxxxxxxxxx> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> Signed-off-by: Willy Tarreau <w@xxxxxx> diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 62663a0..e625efe 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1518,7 +1518,7 @@ static void destroy_use_gss_proxy_proc_entry(struct net *net) {} case RPC_GSS_PROC_DESTROY: if (gss_write_verf(rqstp, rsci->mechctx, gc->gc_seq)) goto auth_err; - rsci->h.expiry_time = get_seconds(); + rsci->h.expiry_time = seconds_since_boot(); set_bit(CACHE_NEGATIVE, &rsci->h.flags); if (resv->iov_len + 4 > PAGE_SIZE) goto drop;