for example on of my latest osd crashes looks like this in dmesg: [Dec 2 08:26] bstore_mempool invoked oom-killer: gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0, oom_score_adj=0 [ +0.000006] bstore_mempool cpuset=ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc mems_allowed=0 [ +0.000010] CPU: 3 PID: 3061712 Comm: bstore_mempool Tainted: G W 4.9.312-7 #1 [ +0.000002] Hardware name: Hardkernel ODROID-HC4 (DT) [ +0.000001] Call trace: [ +0.000011] [<ffffff800908cce0>] dump_backtrace+0x0/0x230 [ +0.000004] [<ffffff800908cf38>] show_stack+0x28/0x34 [ +0.000005] [<ffffff80094863b8>] dump_stack+0xb0/0xe8 [ +0.000006] [<ffffff8009246378>] dump_header+0x70/0x1d8 [ +0.000005] [<ffffff80091ceaec>] oom_kill_process+0xec/0x490 [ +0.000004] [<ffffff80091cf1e4>] out_of_memory+0x124/0x2e0 [ +0.000003] [<ffffff8009237f98>] mem_cgroup_out_of_memory+0x58/0x80 [ +0.000003] [<ffffff800923e0fc>] mem_cgroup_oom_synchronize+0x35c/0x3d4 [ +0.000003] [<ffffff80091cf3bc>] pagefault_out_of_memory+0x1c/0x80 [ +0.000004] [<ffffff800909f3bc>] do_page_fault+0x38c/0x3b0 [ +0.000003] [<ffffff800909f4b0>] do_translation_fault+0xd0/0xf0 [ +0.000003] [<ffffff8009081338>] do_mem_abort+0x58/0xb0 [ +0.000003] Exception stack(0xffffffc026637df0 to 0xffffffc026637f20) [ +0.000002] 7de0: 0000007f86d50c78 0000000082000007 [ +0.000004] 7e00: ffffffc026637ec0 0000007f86d50c78 ffffffc08ae3b900 ffffffc08ae3b900 [ +0.000002] 7e20: ffffffc026637ec0 00000055851577c8 0000000060000000 00000000000409ff [ +0.000003] 7e40: 0000000000000000 ffffff80090837c0 ffffffc026637e90 ffffff800908c51c [ +0.000003] 7e60: ffffffc026637e90 ffffff800908145c 0000007f86d50c78 0000000082000007 [ +0.000003] 7e80: 0000000000000008 ffffffc08ae3b900 0000000000000000 ffffff800908340c [ +0.000002] 7ea0: 0000000000000000 00000040c4e3f000 ffffffffffffffff 00000040c4e3f000 [ +0.000003] 7ec0: 0000000000000000 0000007f870d8b88 0000000016e4c66f 108890ff61ee90a0 [ +0.000003] 7ee0: 00000055c5db2828 000000000000017f 0000007f870d6000 00000000219d0e6c [ +0.000003] 7f00: 0000000000000000 003b9aca00000000 000000006389b6e9 0000000016e4c66f [ +0.000003] [<ffffff8009081470>] do_el0_ia_bp_hardening+0x90/0xa0 [ +0.000001] Exception stack(0xffffffc026637ea0 to 0xffffffc026637fd0) [ +0.000003] 7ea0: 0000000000000000 00000040c4e3f000 ffffffffffffffff 00000040c4e3f000 [ +0.000003] 7ec0: 0000000000000000 0000007f870d8b88 0000000016e4c66f 108890ff61ee90a0 [ +0.000003] 7ee0: 00000055c5db2828 000000000000017f 0000007f870d6000 00000000219d0e6c [ +0.000003] 7f00: 0000000000000000 003b9aca00000000 000000006389b6e9 0000000016e4c66f [ +0.000002] 7f20: 0000000000000018 000000006389b6e9 0016a9ab002471a6 00003b1b6f26b535 [ +0.000003] 7f40: 0000005585ece630 0000007f86d50c78 0000000000000000 0000005605fb2130 [ +0.000003] 7f60: 0000007f768f9ce8 0000000000000000 0000000000000000 000000001003e8f3 [ +0.000003] 7f80: 000000001003df58 000000001626e380 000000003b9aca00 112e0be826d694b3 [ +0.000002] 7fa0: 0004d7bb2ef84792 0000007f768f9b40 00000055859871ac 0000007f768f9b40 [ +0.000002] 7fc0: 0000007f86d50c78 0000000000000000 [ +0.000003] [<ffffff800908340c>] el0_ia+0x18/0x1c [ +0.000002] Task in /docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc killed as a result of limit of /docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc [ +0.000011] memory: usage 3072000kB, limit 3072000kB, failcnt 4030563 [ +0.000002] memory+swap: usage 4059888kB, limit 6144000kB, failcnt 0 [ +0.000002] kmem: usage 4596kB, limit 9007199254740988kB, failcnt 0 [ +0.000001] Memory cgroup stats for /docker/ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc: cache:1308KB rss:3066096KB rss_huge:4096KB mapped_file:308KB dirty:0KB writeback:0KB swap:987888KB inactive_anon:613232KB active_anon:2452864KB inactive_file:864KB active_file:348KB unevictable:0KB [ +0.000021] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ +0.000180] [3061162] 0 3061162 214 0 4 3 8 0 docker-init [ +0.000004] [3061175] 167 3061175 1294319 764584 2298 9 248700 0 ceph-osd [ +0.000015] Memory cgroup out of memory: Kill process 3061175 (ceph-osd) score 985 or sacrifice child [ +0.004798] Killed process 3061175 (ceph-osd) total-vm:5177276kB, anon-rss:3058332kB, file-rss:0kB, shmem-rss:0kB [ +1.042284] oom_reaper: reaped process 3061175 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Am Fr., 2. Dez. 2022 um 09:47 Uhr schrieb Daniel Brunner <daniel@brunner.ninja>: > Hi, > > my OSDs are running odroid-hc4's and they only have about 4GB of memory, > and every 10 minutes a random OSD crashes due to out of memory. Sadly the > whole machine gets unresponsive when the memory gets completely full, so no > ssh access or prometheus output in the meantime. > > After the osd successfully crashed and restarts, and the memory is free > again, i can look into the machine again. > > I've set the memory limit very low on all OSDs: > > for i in {0..17} ; do sudo ceph config set osd.$i osd_memory_target > 939524096 ; done > > which is the absolute minimum, about 0.9GB. > > Why are the OSDs not respecting this limit? I tried enforcing the memory > limit with the docker container by appending -m3200M to the docker run > command, which helps with the unresponsiveness when the memory goes full. > The linux kernel now kills the ceph-osd process earlier when only few > memory resources are left. > > How can I make the ceph-osd not crash anymore? Decreasing pg_num and > pgp_num on my only cephfs pool did not work, the number is still high after > setting to 16. > > > Best regards > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx