[ ... adding shmem maintainers ... ] > On Oct 11, 2023, at 12:06 PM, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: > >> On Oct 11, 2023, at 11:52 AM, Vlad Buslov <vladbu@xxxxxxxxxx> wrote: >> >> On Wed 11 Oct 2023 at 15:34, Chuck Lever III <chuck.lever@xxxxxxxxxx> wrote: >>>> On Oct 11, 2023, at 11:15 AM, Vlad Buslov <vladbu@xxxxxxxxxx> wrote: >>>> >>>> Hello Chuck, >>>> >>>> We have been getting memleaks in offset_ctx->xa in our networking tests: >>>> >>>> unreferenced object 0xffff8881004cd080 (size 576): >>>> comm "systemd", pid 1, jiffies 4294893373 (age 1992.864s) >>>> hex dump (first 32 bytes): >>>> 00 00 06 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ >>>> 38 5c 7c 02 81 88 ff ff 98 d0 4c 00 81 88 ff ff 8\|.......L..... >>>> backtrace: >>>> [<000000000f554608>] xas_alloc+0x306/0x430 >>>> [<0000000075537d52>] xas_create+0x4b4/0xc80 >>>> [<00000000a927aab2>] xas_store+0x73/0x1680 >>>> [<0000000020a61203>] __xa_alloc+0x1d8/0x2d0 >>>> [<00000000ae300af2>] __xa_alloc_cyclic+0xf1/0x310 >>>> [<000000001032332c>] simple_offset_add+0xd8/0x170 >>>> [<0000000073229fad>] shmem_mknod+0xbf/0x180 >>>> [<00000000242520ce>] vfs_mknod+0x3b0/0x5c0 >>>> [<000000001ef218dd>] unix_bind+0x2c2/0xdb0 >>>> [<0000000009b9a8dd>] __sys_bind+0x127/0x1e0 >>>> [<000000003c949fbb>] __x64_sys_bind+0x6e/0xb0 >>>> [<00000000b8a767c7>] do_syscall_64+0x3d/0x90 >>>> [<000000006132ae0d>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 >>>> >>>> It looks like those may be caused by recent commit 6faddda69f62 ("libfs: >>>> Add directory operations for stable offsets") >>> >>> That sounds plausible. >>> >>> >>>> but we don't have a proper >>>> reproduction, just sometimes arbitrary getting the memleak complains >>>> during/after the regression run. >>> >>> If the leak is a trickle rather than a flood, than can you take >>> some time to see if you can narrow down a reproducer? If it's a >>> flood, I can look at this immediately. >> >> No, it is not a flood, we are not getting setups ran out of memory >> during testing or anything. However, I don't have any good idea how to >> narrow down the repro since as you can see from memleak trace it is a >> result of some syscall performed by systemd and none of our tests do >> anything more advanced with it than 'systemctl restart ovs-vswitchd'. >> Basically it is a setup with Fedora and an upstream kernel that executes >> bunch of network offload tests with Open vSwitch, iproute2 tc, Linux >> bridge, etc. > > OK, I'll see what I can do for a reproducer. Thank you for the > report. I've had kmemleak enabled on several systems for a week, and there have been no tmpfs-related leaks detected. That suggests we don't have a problem with normal workloads. My next step is to go look at the ovs-vswitchd.service unit to see if there are any leads there. We might ask Lennart or the VSwitch folks if they have any suggestions too. Meantime, can I ask that you open a bug on bugzilla.kernel.org where we can collect troubleshooting information? Looks like "Memory Management / Other" is appropriate for shmem, and Hugh or Andrew can re-assign ownership to me. -- Chuck Lever