Hi, On our embedded Linux device, we mount a Samba share and periodically write small Files to it (around 70KB, 20 times a second). After several hours (or when about 400'000 files were written) we see that the OOM killer starts to kill processes until it eventually also kills the writing process - rendering the device unusable. I was able to reproduce the issue with a simple shell script that in a loop writes small files with random data to the Samba share using dd. The Samba server (version 4.15.13 on Ubuntu 20.04) is in the same LAN as the device - so throughput and latency are not an issue. The device is a TI Cortex AM62x SOC (basically a BeaglePlay board https://www.beagleboard.org/boards/beagleplay) running the kernel 6.6.32 from TI with realtime patches (built with Yocto with the meta-ti layer). On the device the share is mounted with BusyBox mount (v1.36.1) and the following (default) options: //192.168.103.126/share on /home/root/share type cifs (rw,relatime,vers=3.1.1,sec=none,cache=strict,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.103.126,file_mode=0755,dir_mode=0755, soft,nounix,serverino,mapposix,reparse=nfs,rsize=4194304,wsize=4194304,bsize=1048576,retrans=1,echo_interval=60,actimeo=1,closetimeo=1) When the OOM killer is eventually triggered, dmesg contains reports like: [1527133.672369] test_samba_blfl invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=2, oom_score_adj=0 [1527133.672431] CPU: 1 PID: 14450 Comm: test_samba_blfl Tainted: G O 6.6.32-rt32-ti-rt-01519-g2cc066b2c5d1-dirty #1 [1527133.672443] Hardware name: Kistler EVK with Skyboard V2 using AM62x (DT) [1527133.672449] Call trace: [1527133.672454] dump_backtrace+0xa8/0x118 [1527133.672478] show_stack+0x1c/0x30 [1527133.672486] dump_stack_lvl+0x44/0x58 [1527133.672497] dump_stack+0x14/0x20 [1527133.672504] dump_header+0x4c/0x2c8 [1527133.672514] oom_kill_process+0x364/0x560 [1527133.672521] out_of_memory+0xac/0x460 [1527133.672527] __alloc_pages+0x94c/0xcb8 [1527133.672541] copy_process+0x168/0x12d0 [1527133.672549] kernel_clone+0x88/0x388 [1527133.672555] __do_sys_clone+0x5c/0x78 [1527133.672560] __arm64_sys_clone+0x24/0x38 [1527133.672567] el0_svc_common.constprop.0+0x60/0x138 [1527133.672576] do_el0_svc+0x20/0x30 [1527133.672584] el0_svc+0x18/0x50 [1527133.672592] el0t_64_sync_handler+0x118/0x128 [1527133.672601] el0t_64_sync+0x14c/0x150 [1527133.672626] Mem-Info: [1527133.672632] active_anon:103 inactive_anon:909 isolated_anon:0 [1527133.672632] active_file:365 inactive_file:320316 isolated_file:1 [1527133.672632] unevictable:0 dirty:0 writeback:0 [1527133.672632] slab_reclaimable:99138 slab_unreclaimable:3419 [1527133.672632] mapped:177 shmem:233 pagetables:110 [1527133.672632] sec_pagetables:0 bounce:0 [1527133.672632] kernel_misc_reclaimable:0 [1527133.672632] free:50574 free_pcp:62 free_cma:22870 [1527133.672648] Node 0 active_anon:412kB inactive_anon:3636kB active_file:1460kB inactive_file:1281264kB unevictable:0kB isolated(anon):0kB isolated(file):4kB mapped:708kB dirty:0kB writeback:0kB shmem:932kB shmem_thp:0kB shmem_pmdmapped:0kB anon_thp:0kB writeback_tmp:0kB kernel_stack:1760kB pagetables:440kB sec_pagetables:0kB all_unreclaimable? no [1527133.672663] DMA free:202296kB boost:0kB min:22528kB low:28160kB high:33792kB reserved_highatomic:2048KB active_anon:412kB inactive_anon:3636kB active_file:1460kB inactive_file:1281264kB unevictable:0kB writepending:0kB present:2097152kB managed:1916044kB mlocked:0kB bounce:0kB free_pcp:252kB local_pcp:184kB free_cma:91480kB [1527133.672680] lowmem_reserve[]: 0 0 0 0 [1527133.672693] DMA: 15619*4kB (UMEC) 8377*8kB (UMEC) 385*16kB (C) 175*32kB (C) 211*64kB (C) 75*128kB (C) 12*256kB (C) 0*512kB 0*1024kB 15*2048kB (C) 1*4096kB (C) = 202244kB [1527133.672738] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [1527133.672743] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB [1527133.672750] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [1527133.672755] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB [1527133.672759] 273713 total pagecache pages [1527133.672762] 0 pages in swap cache [1527133.672765] Free swap = 0kB [1527133.672768] Total swap = 0kB [1527133.672770] 524288 pages RAM [1527133.672772] 0 pages HighMem/MovableOnly [1527133.672775] 45277 pages reserved [1527133.672777] 32768 pages cma reserved [1527133.672779] 0 pages hwpoisoned [1527133.672782] Tasks state (memory values in pages): [1527133.672785] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [1527133.672815] [ 318] 102 318 1349 160 45056 0 0 dbus-daemon [1527133.672825] [ 322] 0 322 1952 129 53248 0 0 connmand [1527133.672835] [ 329] 0 329 755 64 40960 0 0 dropbear [1527133.672844] [ 341] 0 341 3634 224 69632 0 0 wpa_supplicant [1527133.672854] [ 345] 0 345 874 96 45056 0 0 syslogd [1527133.672863] [ 349] 0 349 874 64 49152 0 0 klogd [1527133.672873] [ 494] 0 494 874 64 45056 0 0 getty [1527133.672882] [ 7989] 0 7989 946 192 53248 0 0 start_getty [1527133.672892] [ 7991] 0 7991 1045 256 49152 0 0 sh [1527133.672902] [ 14450] 0 14450 946 160 45056 0 0 test_samba_blfl [1527133.672916] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),task=sh,pid=7991,uid=0 [1527133.672950] Out of memory: Killed process 7991 (sh) total-vm:4180kB, anon-rss:384kB, file-rss:640kB, shmem-rss:0kB, UID:0 pgtables:48kB oom_score_adj:0 While the script was running, I monitored the system memory with the free command: While the buff/cache amount increases linearly, the free amount decreases linearly until the overcommit ratio is reached and the kernel frees up memory. During that time the available amount remains at a constant high value (1.8 GB / 2GB available). This looks to me like expected caching behavior - until the OOM killer is triggered. I tried various options to prevent the OOM killer: - cifs mount options vers=2.1,3.1.1, cache=strict,loose - Increase the CIFSMaxBufSize of the cifs kernel module to the maximum - Increase the VFS cache pressure - Disable overcommit with echo 2 > /proc/sys/vm/overcommit_memory - Periodically drop caches manually with echo 3 > /proc/sys/vm/drop_caches None of these did prevent the OOM killer from triggering. Only disabling the cache with the cache=none mount option prevents the OOM killer (thought it still slowly fills the buff/cache memory somehow), but the write performance impact (around 100 millisec. per write) is too much for the performance goals. I also searched the mailing list archives and came across this message https://lore.kernel.org/linux-cifs/2db05b3eb59bfb59688e7cb435c1b5f2096b8f8a.camel@xxxxxxxxxx/ that mentions the OOM killer being triggered by the xfstest generic/531. But I' not sure whether that is relevant to this issue. I am out of ideas for ways to work-around of fix this issue. Does anybody here have an idea for a work-around? What information would help to identify the cause for the issue? Many thanks for your help and sorry for this long message. Regards, Martin Rösch Kistler Instrumente AG Eulachstrasse 22, 8408 Winterthur, Switzerland martin.roesch@xxxxxxxxxxx, www.kistler.com