Fwd: Possible memory leak on nfsd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi -

An NFSD page allocation on v6.1.y is triggering OOM-killer. The reporter
has provided a lot of detail, and we need some help steering us towards
the possible leak culprit. Any takers?

(We've asked the reporter to reproduce on a more recent kernel if
possible).

-------- Forwarded Message --------
Subject: Re: Possible memory leak on nfsd
Date: Thu, 12 Dec 2024 16:00:17 +0000
From: Chuck Lever via Bugspray Bot <bugbot@xxxxxxxxxx>
To: jlayton@xxxxxxxxxx, linux-nfs@xxxxxxxxxxxxxxx, trondmy@xxxxxxxxxx, cel@xxxxxxxxxx, anna@xxxxxxxxxx

Chuck Lever writes via Kernel.org Bugzilla:

From attachment 307290:

[29924.805968] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/init.scope,task=(sd-pam),pid=4503,uid=0 [29924.805991] Out of memory: Killed process 4503 ((sd-pam)) total-vm:173972kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:96kB oom_score_adj:100 [29925.425864] nfsd invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0 [29925.425872] CPU: 0 PID: 1874 Comm: nfsd Kdump: loaded Tainted: G E 6.1.119-1.el9.elrepo.x86_64 #1 [29925.425875] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.22.2 09/12/2024
[29925.425877] Call Trace:
[29925.425880]  <TASK>
[29925.425885]  dump_stack_lvl+0x45/0x5e
[29925.425893]  dump_header+0x4a/0x213
[29925.425897]  oom_kill_process.cold+0xb/0x10
[29925.425901]  out_of_memory+0xed/0x2e0
[29925.425906]  __alloc_pages_slowpath.constprop.0+0x707/0x9d0
[29925.425916]  __alloc_pages+0x35d/0x370
[29925.425921]  __alloc_pages_bulk+0x3e5/0x680
[29925.425927]  svc_alloc_arg+0x81/0x1f0 [sunrpc]
[29925.425991]  svc_recv+0x1f/0x190 [sunrpc]
[29925.426043]  ? nfsd_inet6addr_event+0x110/0x110 [nfsd]
[29925.426080]  nfsd+0x87/0xc0 [nfsd]
[29925.426113]  kthread+0xe5/0x110
[29925.426118]  ? kthread_complete_and_exit+0x20/0x20
[29925.426122]  ret_from_fork+0x1f/0x30
[29925.426129]  </TASK>

NFSD is triggering the OOM killer because it frequently allocates up to 256 pages at a time to fill the send and receive buffers. It is not necessarily the source of a leak.

The bulk page allocator is on the slow path here, suggesting there weren't any free pages available on the lists it normally checks first. So it is doing one-at-a-time order-0 allocations, a sign that memory is short.

We see that Node 1 appears to be short on free memory, but the system has not pushed into swap at all. Kernel memory isn't swappable, so whatever is leaking is in the kernel proper.

The slab caches all look reasonably sized, so not likely a slab leak.

At this point we would want someone with some MM expertise to come in and help us nail down the leak.

View: https://bugzilla.kernel.org/show_bug.cgi?id=219535#c13
You can reply to this message to join the discussion.

--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (bugspray 0.1-dev)





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux