Re: massive memory leak in 3.1[3-5] with nfs4+kerberos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Oct 11, 2014 at 12:36:27AM -0300, Carlos Carvalho wrote:
> We're observing a big memory leak in 3.1[3-5]. We've gone until 3.15.8 and back
> to 3.14 because of LTS. Today we're running 3.14.21. The problem has existed
> for several months but recently has become a show-stopper.

Is there an older version that you know was OK?

> Here are the values of SUnreclaim: from /proc/meminfo, sampled at every 4h
> (units are kB):
> 
> 87192
> 297044
> 765320
> 2325160
> 3306056
> 4412808
> 4799292
> 5085392
> 4999936
> 5521648
> 6628496
> 7785460
> 8518084
> 8988404
> 9141220
> 9533224
> 10053484
> 10954000
> 11716700
> 12369516
> 12847412
> 13318872
> 13846196
> 14339476
> 14815600
> 15293564
> 15798024
> 17092772
> 19240084
> 21679888
> 22399060
> 22943812
> 23407004
> 24049804
> 26210880
> 28034980
> 29059812  <== almost 30GB!

Can you figure out from /proc/slabinfo which slab is the problem?

> After a few days the machine has lost so much memory that it panics or becomes
> very slow due to lack of cache and we have to reboot it. It's a busy file
> server of home directories.
> 
> We have several other busy servers (including identical hardware) but the
> memory leak happens only in this machine. What is different with it is that
> it's the only place where we use:
> - nfs4 with authentication and encryption by kerberos
> - raid10
> 
> All others do only nfs3 or no nfs, and raid6. That's why we suspect it's a nfs4
> problem.

It would also be interesting to know whether the problem is with nfs4 or
krb5.  But I don't know if you have an easy way to test that.  (E.g.
temporarily downgrade to nfs3 while keeping krb5 and see if that
matters?)

Do you know if any of your clients are using NFSv4.1?

What filesystem are you exporting, with what options?

> What about these patches: http://permalink.gmane.org/gmane.linux.nfs/62012
> Bruce said they were accepted but they're not in 3.14. Were they rejected or
> forgotten? Could they have any relation to this memory leak?

Those are in 3.15.

There'd be no harm in trying them, but on a quick skim I don't think
they're likely to explain your symptoms.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux