OOM killer triggered by CIFS mount

Roesch Martin <Martin.Roesch@xxxxxxxxxxx> · Mon, 17 Feb 2025 12:27:15 +0000

Hi,

On our embedded Linux device, we mount a Samba share and periodically write small Files to it (around 70KB, 20 times a second).
After several hours (or when about 400'000 files were written) we see that the OOM killer starts to kill processes until it eventually also kills the writing process -
rendering the device unusable.

I was able to reproduce the issue with a simple shell script that in a loop writes small files with random data to the Samba share using dd.
The Samba server (version 4.15.13 on Ubuntu 20.04) is in the same LAN as the device - so throughput and latency are not an issue.
The device is a TI Cortex AM62x SOC (basically a BeaglePlay board https://www.beagleboard.org/boards/beagleplay) running the
kernel 6.6.32 from TI with realtime patches (built with Yocto with the meta-ti layer).
On the device the share is mounted with BusyBox mount (v1.36.1) and the following (default) options:
//192.168.103.126/share on /home/root/share type cifs (rw,relatime,vers=3.1.1,sec=none,cache=strict,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.103.126,file_mode=0755,dir_mode=0755,
soft,nounix,serverino,mapposix,reparse=nfs,rsize=4194304,wsize=4194304,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1)

When the OOM killer is eventually triggered, dmesg contains reports like:
[1527133.672369] test_samba_blfl invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=2, oom_score_adj=0
[1527133.672431] CPU: 1 PID: 14450 Comm: test_samba_blfl Tainted: G           O       6.6.32-rt32-ti-rt-01519-g2cc066b2c5d1-dirty #1
[1527133.672443] Hardware name: Kistler EVK with Skyboard V2 using AM62x (DT)
[1527133.672449] Call trace:
[1527133.672454]  dump_backtrace+0xa8/0x118
[1527133.672478]  show_stack+0x1c/0x30
[1527133.672486]  dump_stack_lvl+0x44/0x58
[1527133.672497]  dump_stack+0x14/0x20
[1527133.672504]  dump_header+0x4c/0x2c8
[1527133.672514]  oom_kill_process+0x364/0x560
[1527133.672521]  out_of_memory+0xac/0x460
[1527133.672527]  __alloc_pages+0x94c/0xcb8
[1527133.672541]  copy_process+0x168/0x12d0
[1527133.672549]  kernel_clone+0x88/0x388
[1527133.672555]  __do_sys_clone+0x5c/0x78
[1527133.672560]  __arm64_sys_clone+0x24/0x38
[1527133.672567]  el0_svc_common.constprop.0+0x60/0x138
[1527133.672576]  do_el0_svc+0x20/0x30
[1527133.672584]  el0_svc+0x18/0x50
[1527133.672592]  el0t_64_sync_handler+0x118/0x128
[1527133.672601]  el0t_64_sync+0x14c/0x150
[1527133.672626] Mem-Info:
[1527133.672632] active_anon:103 inactive_anon:909 isolated_anon:0
[1527133.672632]  active_file:365 inactive_file:320316 isolated_file:1
[1527133.672632]  unevictable:0 dirty:0 writeback:0
[1527133.672632]  slab_reclaimable:99138 slab_unreclaimable:3419
[1527133.672632]  mapped:177 shmem:233 pagetables:110
[1527133.672632]  sec_pagetables:0 bounce:0
[1527133.672632]  kernel_misc_reclaimable:0
[1527133.672632]  free:50574 free_pcp:62 free_cma:22870
[1527133.672648] Node 0 active_anon:412kB inactive_anon:3636kB active_file:1460kB inactive_file:1281264kB unevictable:0kB isolated(anon):0kB isolated(file):4kB mapped:708kB dirty:0kB writeback:0kB shmem:932kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:1760kB pagetables:440kB sec_pagetables:0kB all_unreclaimable? no
[1527133.672663] DMA free:202296kB boost:0kB min:22528kB low:28160kB high:33792kB reserved_highatomic:2048KB active_anon:412kB inactive_anon:3636kB active_file:1460kB inactive_file:1281264kB unevictable:0kB writepending:0kB present:2097152kB managed:1916044kB mlocked:0kB bounce:0kB free_pcp:252kB local_pcp:184kB free_cma:91480kB
[1527133.672680] lowmem_reserve[]: 0 0 0 0
[1527133.672693] DMA: 15619*4kB (UMEC) 8377*8kB (UMEC) 385*16kB (C) 175*32kB (C) 211*64kB (C) 75*128kB (C) 12*256kB (C) 0*512kB 0*1024kB 15*2048kB (C) 1*4096kB (C) = 202244kB
[1527133.672738] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[1527133.672743] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB
[1527133.672750] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[1527133.672755] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB
[1527133.672759] 273713 total pagecache pages
[1527133.672762] 0 pages in swap cache
[1527133.672765] Free swap  = 0kB
[1527133.672768] Total swap = 0kB
[1527133.672770] 524288 pages RAM
[1527133.672772] 0 pages HighMem/MovableOnly
[1527133.672775] 45277 pages reserved
[1527133.672777] 32768 pages cma reserved
[1527133.672779] 0 pages hwpoisoned
[1527133.672782] Tasks state (memory values in pages):
[1527133.672785] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[1527133.672815] [    318]   102   318     1349      160    45056        0             0 dbus-daemon
[1527133.672825] [    322]     0   322     1952      129    53248        0             0 connmand
[1527133.672835] [    329]     0   329      755       64    40960        0             0 dropbear
[1527133.672844] [    341]     0   341     3634      224    69632        0             0 wpa_supplicant
[1527133.672854] [    345]     0   345      874       96    45056        0             0 syslogd
[1527133.672863] [    349]     0   349      874       64    49152        0             0 klogd
[1527133.672873] [    494]     0   494      874       64    45056        0             0 getty
[1527133.672882] [   7989]     0  7989      946      192    53248        0             0 start_getty
[1527133.672892] [   7991]     0  7991     1045      256    49152        0             0 sh
[1527133.672902] [  14450]     0 14450      946      160    45056        0             0 test_samba_blfl
[1527133.672916] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=sh,pid=7991,uid=0
[1527133.672950] Out of memory: Killed process 7991 (sh) total-vm:4180kB, anon-rss:384kB, file-rss:640kB, shmem-rss:0kB, UID:0 pgtables:48kB oom_score_adj:0

While the script was running, I monitored the system memory with the free command:
While the buff/cache amount increases linearly, the free amount decreases linearly  until the overcommit ratio is reached and the kernel frees up memory.
During that time the available amount remains at a constant high value (1.8 GB / 2GB available).
This looks to me like expected caching behavior - until the OOM killer is triggered.

I tried various options to prevent the OOM killer:
- cifs mount options vers=2.1,3.1.1, cache=strict,loose
- Increase the CIFSMaxBufSize of the cifs kernel module to the maximum
- Increase the VFS cache pressure
- Disable overcommit with echo 2 > /proc/sys/vm/overcommit_memory
- Periodically drop caches manually with echo 3 > /proc/sys/vm/drop_caches
None of these did prevent the OOM killer from triggering.
Only disabling the cache with the cache=none mount option prevents the OOM killer (thought it still slowly fills the buff/cache memory somehow),
but the write performance impact (around 100 millisec. per write) is too much for the performance goals.

I also searched the mailing list archives and came across this message
https://lore.kernel.org/linux-cifs/2db05b3eb59bfb59688e7cb435c1b5f2096b8f8a.camel@xxxxxxxxxx/
that mentions the OOM killer being triggered by the xfstest generic/531. But I' not sure whether that is relevant to this issue.

I am out of ideas for ways to work-around of fix this issue.
Does anybody here have an idea for a work-around?
What information would help to identify the cause for the issue?

Many thanks for your help and sorry for this long message.

Regards, 

Martin Rösch

Kistler Instrumente AG
Eulachstrasse 22, 8408 Winterthur, Switzerland
martin.roesch@xxxxxxxxxxx, www.kistler.com