Chen Chen added an attachment on Kernel.org Bugzilla: Created attachment 307283 sar -r mem usage My RHEL9 server with only NFS service often OOMed after a day or two, with no userspace memory usage. So I switched to elrepo kernel-lts and still the problem persists. I'm now using 6.1.119-1.el9.elrepo.x86_64. The problem also occured on (RHEL) 5.14.0-427.40.1.el9_4, (RHEL) 5.14.0-503.14.1.el9_5 and 6.1.115-1.el9.elrepo.x86_64. I'm not so sure it is caused by NFS but since it is the only service running on the server I can only suspect it is the culprit. The server has a Mellanox Technologies MT27500 Family [ConnectX-3] Infiniband Card and NFSoRMDA is enabled. No 3rd drivers used. The following data were gathered moments before it OOMed and crashed sar reported a typical memory leak appearance. 01:20:13 AM 390187300 388732764 3501864 0.89 4856 363952 390344 0.09 100680 358384 17148 01:30:13 AM 379492128 378312768 13642416 3.46 4856 909388 390344 0.09 108844 895740 16 01:40:13 AM 367687716 367062060 24851416 6.30 4856 1498272 390344 0.09 116736 1476672 16 01:50:50 AM 361704244 361471420 30437312 7.72 4856 1888780 390344 0.09 127888 1856036 29912 02:00:13 AM 355796296 355848120 36061648 9.15 4856 2173560 390344 0.09 131544 2137152 0 .... 09:00:13 AM 1518392 18089616 373760196 94.79 4760 18648816 390344 0.09 470608 18273412 36 09:10:13 AM 1499980 17223900 374626172 95.01 4740 17801676 390344 0.09 471964 17424672 5292 09:20:13 AM 1561896 6784736 385059756 97.66 1712 7338540 423580 0.10 325452 7070372 0 meminfo also didn't show anything using ram. MemTotal: 394292660 kB MemFree: 1551296 kB MemAvailable: 6776108 kB Buffers: 1712 kB Cached: 7340144 kB SwapCached: 4308 kB Active: 325936 kB Inactive: 7071836 kB ... KReclaimable: 129816 kB Slab: 331596 kB SReclaimable: 129816 kB SUnreclaim: 201780 kB ... VmallocUsed: 319528 kB slabinfo is low. Attached. vmallocinfo doesn't have much. Attached. dmesg log showed it has killed nearly every userspace programs. [29960.547403] Tasks state (memory values in pages): [29960.547404] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [29960.547412] [ 1020] 0 1020 9498 640 94208 1000 -1000 systemd-udevd [29960.547417] [ 1247] 0 1247 105208 6888 126976 0 -1000 multipathd [29960.547421] [ 1342] 0 1342 23190 330 65536 764 -1000 auditd [29960.547428] [ 1472] 0 1472 4185 806 73728 357 -1000 sshd [29960.547438] Out of memory and no killable processes... [29960.547439] Kernel panic - not syncing: System is deadlocked on memory systemctl status attached. Nothing else is running. I have a 224G vmcore dump but have no idea how to deal with it. And it is too big to upload somewhere I think. I appreciate any help to help me detect what went wrong. File: sar (text/plain) Size: 6.95 KiB Link: https://bugzilla.kernel.org/attachment.cgi?id=307283 --- sar -r mem usage You can reply to this message to join the discussion. -- Deet-doot-dot, I am a bot. Kernel.org Bugzilla (bugspray 0.1-dev)