Before I start doing this on my own build I tried it with unmodified linux-image-6.6.13+bpo-amd64 from Debian 12. I installed systemtap, linux-headers-6.6.13+bpo-amd64 and linux-image-6.6.13+bpo-amd64-dbg and tried to run stap: user@deb:~$ sudo stap -v --all-modules kmem_alloc.stp nfsd_file WARNING: Kernel function symbol table missing [man warning::symbols] Pass 1: parsed user script and 484 library scripts using 110120virt/96896res/7168shr/89800data kb, in 1360usr/1080sys/4963real ms. WARNING: cannot find module kernel debuginfo: No DWARF information found [man warning::debuginfo] semantic error: resolution failed in DWARF builder semantic error: while resolving probe point: identifier 'kernel' at kmem_alloc.stp:5:7 source: probe kernel.function("kmem_cache_alloc") { ^ semantic error: no match Pass 2: analyzed script: 1 probe, 5 functions, 1 embed, 3 globals using 112132virt/100352res/8704shr/91792data kb, in 30usr/30sys/167real ms. Pass 2: analysis failed. [man error::pass2] Tip: /usr/share/doc/systemtap/README.Debian should help you get started. user@deb:~$ user@deb:~$ grep -E 'CONFIG_DEBUG_INFO|CONFIG_KPROBES|CONFIG_DEBUG_FS|CONFIG_RELAY' /boot/config-6.6.13+bpo-amd64 CONFIG_RELAY=y CONFIG_KPROBES=y CONFIG_KPROBES_ON_FTRACE=y CONFIG_DEBUG_INFO=y # CONFIG_DEBUG_INFO_NONE is not set CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y # CONFIG_DEBUG_INFO_DWARF4 is not set # CONFIG_DEBUG_INFO_DWARF5 is not set # CONFIG_DEBUG_INFO_REDUCED is not set CONFIG_DEBUG_INFO_COMPRESSED_NONE=y # CONFIG_DEBUG_INFO_COMPRESSED_ZLIB is not set # CONFIG_DEBUG_INFO_SPLIT is not set CONFIG_DEBUG_INFO_BTF=y CONFIG_DEBUG_INFO_BTF_MODULES=y CONFIG_DEBUG_FS=y CONFIG_DEBUG_FS_ALLOW_ALL=y # CONFIG_DEBUG_FS_DISALLOW_MOUNT is not set # CONFIG_DEBUG_FS_ALLOW_NONE is not set user@deb:~$ Do I need to enable other options? > Gesendet: Dienstag, den 26.03.2024 um 12:15 Uhr > Von: "Benjamin Coddington" <bcodding@xxxxxxxxxx> > An: "Chuck Lever III" <chuck.lever@xxxxxxxxxx> > Cc: "Jan Schunk" <scpcom@xxxxxx>, "Jeff Layton" <jlayton@xxxxxxxxxx>, "Neil Brown" <neilb@xxxxxxx>, "Olga Kornievskaia" <kolga@xxxxxxxxxx>, "Dai Ngo" <dai.ngo@xxxxxxxxxx>, "Tom Talpey" <tom@xxxxxxxxxx>, "Linux NFS Mailing List" <linux-nfs@xxxxxxxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx > Betreff: Re: [External] : nfsd: memory leak when client does many file operations > > On 25 Mar 2024, at 16:11, Chuck Lever III wrote: > > >> On Mar 25, 2024, at 3:55 PM, Jan Schunk <scpcom@xxxxxx> wrote: > >> > >> The VM is now running 20 hours with 512MB RAM, no desktop, without the "noatime" mount option and without the "async" export option. > >> > >> Currently there is no issue, but the memory usage is still contantly growing. It may just take longer before something happens. > >> > >> top - 00:49:49 up 3 min, 1 user, load average: 0,21, 0,19, 0,09 > >> Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie > >> %CPU(s): 0,2 us, 0,3 sy, 0,0 ni, 99,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st > >> MiB Spch: 467,0 total, 302,3 free, 89,3 used, 88,1 buff/cache > >> MiB Swap: 975,0 total, 975,0 free, 0,0 used. 377,7 avail Spch > >> > >> top - 15:05:39 up 14:19, 1 user, load average: 1,87, 1,72, 1,65 > >> Tasks: 104 total, 1 running, 103 sleeping, 0 stopped, 0 zombie > >> %CPU(s): 0,2 us, 4,9 sy, 0,0 ni, 53,3 id, 39,0 wa, 0,0 hi, 2,6 si, 0,0 st > >> MiB Spch: 467,0 total, 21,2 free, 147,1 used, 310,9 buff/cache > >> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 319,9 avail Spch > >> > >> top - 20:48:16 up 20:01, 1 user, load average: 5,02, 2,72, 2,08 > >> Tasks: 104 total, 5 running, 99 sleeping, 0 stopped, 0 zombie > >> %CPU(s): 0,2 us, 46,4 sy, 0,0 ni, 11,9 id, 2,3 wa, 0,0 hi, 39,2 si, 0,0 st > >> MiB Spch: 467,0 total, 16,9 free, 190,8 used, 271,6 buff/cache > >> MiB Swap: 975,0 total, 952,9 free, 22,1 used. 276,2 avail Spch > > > > I don't see anything in your original memory dump that > > might account for this. But I'm at a loss because I'm > > a kernel developer, not a support guy -- I don't have > > any tools or expertise that can troubleshoot a system > > without rebuilding a kernel with instrumentation. My > > first instinct is to tell you to bisect between v6.3 > > and v6.4, or at least enable kmemleak, but I'm guessing > > you don't build your own kernels. > > > > My only recourse at this point would be to try to > > reproduce it myself, but unfortunately I've just > > upgraded my whole lab to Fedora 39, and there's a grub > > bug that prevents booting any custom-built kernel > > on my hardware. > > > > So I'm stuck until I can nail that down. Anyone else > > care to help out? > > Sure - I can throw some stuff.. > > Can we dig into which memory slabs might be growing? Something like: > > watch -d "cat /proc/slabinfo | grep nfsd" > > .. for a bit might show what is growing. > > Then use a systemtap script like the one below to trace the allocations - use: > > stap -v --all-modules kmem_alloc.stp <slab_name> > > Ben > > > 8<---------------------------- save as kmem_alloc.stp ---------------------------- > > # This script displays the number of given slab allocations and the backtraces leading up to it. > > global slab = @1 > global stats, stacks > probe kernel.function("kmem_cache_alloc") { > if (kernel_string($s->name) == slab) { > stats[execname()] <<< 1 > stacks[execname(),kernel_string($s->name),backtrace()] <<< 1 > } > } > # Exit after 10 seconds > # probe timer.ms(10000) { exit () } > probe end { > printf("Number of %s slab allocations by process\n", slab) > foreach ([exec] in stats) { > printf("%s:\t%d\n",exec,@count(stats[exec])) > } > printf("\nBacktrace of processes when allocating\n") > foreach ([proc,cache,bt] in stacks) { > printf("Exec: %s Name: %s Count: %d\n",proc,cache,@count(stacks[proc,cache,bt])) > print_stack(bt) > printf("\n-------------------------------------------------------\n\n") > } > } >