Possible memory leak on nfsd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Chen Chen added an attachment on Kernel.org Bugzilla:

Created attachment 307283
sar -r mem usage

My RHEL9 server with only NFS service often OOMed after a day or two, with no userspace memory usage. So I switched to elrepo kernel-lts and still the problem persists.

I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on (RHEL) 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and 6.1.115-1.el9.elrepo.x86_64.

I'm not so sure it is caused by NFS but since it is the only service running on the server I can only suspect it is the culprit. The server has a Mellanox Technologies MT27500 Family [ConnectX-3] Infiniband Card and NFSoRMDA is enabled. No 3rd drivers used.

The following data were gathered moments before it OOMed and crashed

sar reported a typical memory leak appearance.
01:20:13 AM 390187300 388732764   3501864      0.89      4856    363952    390344      0.09    100680    358384     17148
01:30:13 AM 379492128 378312768  13642416      3.46      4856    909388    390344      0.09    108844    895740        16
01:40:13 AM 367687716 367062060  24851416      6.30      4856   1498272    390344      0.09    116736   1476672        16
01:50:50 AM 361704244 361471420  30437312      7.72      4856   1888780    390344      0.09    127888   1856036     29912
02:00:13 AM 355796296 355848120  36061648      9.15      4856   2173560    390344      0.09    131544   2137152         0
....
09:00:13 AM   1518392  18089616 373760196     94.79      4760  18648816    390344      0.09    470608  18273412        36
09:10:13 AM   1499980  17223900 374626172     95.01      4740  17801676    390344      0.09    471964  17424672      5292
09:20:13 AM   1561896   6784736 385059756     97.66      1712   7338540    423580      0.10    325452   7070372         0

meminfo also didn't show anything using ram.
MemTotal:       394292660 kB
MemFree:         1551296 kB
MemAvailable:    6776108 kB
Buffers:            1712 kB
Cached:          7340144 kB
SwapCached:         4308 kB
Active:           325936 kB
Inactive:        7071836 kB
...
KReclaimable:     129816 kB
Slab:             331596 kB
SReclaimable:     129816 kB
SUnreclaim:       201780 kB
...
VmallocUsed:      319528 kB

slabinfo is low. Attached.

vmallocinfo doesn't have much. Attached.

dmesg log showed it has killed nearly every userspace programs.
[29960.547403] Tasks state (memory values in pages):
[29960.547404] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[29960.547412] [   1020]     0  1020     9498      640    94208     1000         -1000 systemd-udevd
[29960.547417] [   1247]     0  1247   105208     6888   126976        0         -1000 multipathd
[29960.547421] [   1342]     0  1342    23190      330    65536      764         -1000 auditd
[29960.547428] [   1472]     0  1472     4185      806    73728      357         -1000 sshd
[29960.547438] Out of memory and no killable processes...
[29960.547439] Kernel panic - not syncing: System is deadlocked on memory

systemctl status attached. Nothing else is running.

I have a 224G vmcore dump but have no idea how to deal with it. And it is too big to upload somewhere I think.

I appreciate any help to help me detect what went wrong.

File: sar (text/plain)
Size: 6.95 KiB
Link: https://bugzilla.kernel.org/attachment.cgi?id=307283
---
sar -r mem usage

You can reply to this message to join the discussion.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux