Dear Christian, thank you for your reply. I think I found the cause of my problem (a leader election loop with the leader being evicted from quorum and returning every 5s) and could resolve it by rebooting the host. I don't know whether that's the root cause but I'm happy enough with the situation and could provide more logs if that's helpful for others. A bit of the history of the resolution: I checked and indeed memory pressure appears to be not a problem, see below for the output of a `tell ... heap stats` command. I'm running a stock CentOS 7 kernel (3.10). I have 35 osds in this server with 250GB of RAM in total and ~150G are displayed as used so the ~100G of cache should not hurt me, I understand now. I had a closer look and also looked at the log output of the monitors. It appears that the mon on my osd-2 server is regularly thrown out of quorum and joins back in as the leader. Leader election is started every 5 seconds or so. I'm seeing 100% CPU load for the mon processes. I dug around the mailing list and issues and it appears that similar problems have occurred in the past at different versions and apparently for different reasons. I'm not seeing this problem on an identically built "mirror" cluster, which also contradicts my original assumption of memory usage. The mirror system does not have its file system mounted and no sync process runs at the moment. In the spirit of "have you tried turning it off and on again?", I rebooted my osd-2 host and - the leader election loop disappers. There is only one leader election when the host comes back and that's it. Wonderful. Best wishes, Manuel --8<-- --8<-- mon.osd-2 log output --8<-- --8<-- Sep 29 07:12:56 osd-2 bash[13839]: debug 2021-09-29T05:12:56.868+0000 7f25cdf36700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f25cdf36700' had timed out after 0.000000000s <repeats several time> Sep 29 07:13:11 osd-2 bash[13839]: debug 2021-09-29T05:13:11.900+0000 7f25cff3a700 0 log_channel(cluster) log [INF] : mon.osd-2 is new leader, mons osd-2,osd-5,osd-4,osd-3 in quorum (ranks 1,2,3,4) Sep 29 07:13:12 osd-2 bash[13839]: debug 2021-09-29T05:13:12.291+0000 7f25cff3a700 0 mon.osd-2@1(leader) e11 handle_command mon_command([{prefix=config-key set, key=device/Micron_5300_MTFDDAK1T9TDS_202528E4AA91}] v 0) v1 Sep 29 07:13:12 osd-2 bash[13839]: debug 2021-09-29T05:13:12.291+0000 7f25cff3a700 0 mon.osd-2@1(leader) e11 handle_command mon_command({"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/osd-2/mirror_snapshot_schedule"} v 0) v1 Sep 29 07:13:12 osd-2 bash[13839]: debug 2021-09-29T05:13:12.291+0000 7f25cff3a700 0 log_channel(audit) log [INF] : from='mgr.66140493 172.16.62.11:0/3686442004' entity='mgr.osd-2' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/osd-2/mirror_snapshot_schedule"}]: dispatch Sep 29 07:13:12 osd-2 bash[13839]: debug 2021-09-29T05:13:12.292+0000 7f25cff3a700 0 mon.osd-2@1(leader) e11 handle_command mon_command({"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/osd-2/trash_purge_schedule"} v 0) v1 Sep 29 07:13:12 osd-2 bash[13839]: debug 2021-09-29T05:13:12.292+0000 7f25cff3a700 0 log_channel(audit) log [INF] : from='mgr.66140493 172.16.62.11:0/3686442004' entity='mgr.osd-2' cmd=[{"prefix":"config rm","who":"mgr","name":"mgr/rbd_support/osd-2/trash_purge_schedule"}]: dispatch Sep 29 07:13:12 osd-2 bash[13839]: debug 2021-09-29T05:13:12.546+0000 7f25cff3a700 0 mon.osd-2@1(leader) e11 handle_command mon_command([{prefix=config-key set, key=mgr/telemetry/last_upload}] v 0) v1 Sep 29 07:13:13 osd-2 bash[13839]: debug 2021-09-29T05:13:13.414+0000 7f25d173d700 1 mon.osd-2@1(leader) e11 handle_auth_request failed to assign global_id <repeats several times> Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:17.596+0000 7f25cff3a700 0 mon.osd-2@1(leader) e11 handle_command mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) v1 Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.612+0000 7f25cff3a700 -1 mon.osd-2@1(leader) e11 get_health_metrics reporting 49385 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.634+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : Health detail: HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests; 1/5 mons down, quorum osd-1,osd-5,osd-4,osd-3 Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.635+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : [WRN] MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.635+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : mds.cephfs.osd-1.qkzuas(mds.0): Client gw-1 failing to respond to capability release client_id: 66086277 Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.635+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : [WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.635+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : mds.cephfs.osd-1.qkzuas(mds.0): 8 slow requests are blocked > 30 secs Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.636+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : [WRN] MON_DOWN: 1/5 mons down, quorum osd-1,osd-5,osd-4,osd-3 Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.636+0000 7f25cd735700 0 log_channel(cluster) log [WRN] : mon.osd-2 (rank 1) addr [v2:172.16.62.11:3300/0,v1:172.16.62.11:6789/0] is down (out of quorum) Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.636+0000 7f25cd735700 0 log_channel(cluster) log [INF] : mon.osd-2 calling monitor election Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.636+0000 7f25cd735700 1 paxos.1).electionLogic(21993) init, last seen epoch 21993, mid-election, bumping Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.646+0000 7f25cd735700 1 mon.osd-2@1(electing) e11 collect_metadata md127: no unique device id for md127: fallback method has no model nor serial' Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.651+0000 7f25d173d700 1 mon.osd-2@1(electing) e11 handle_auth_request failed to assign global_id Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.683+0000 7f25cd735700 1 mon.osd-2@1(electing) e11 collect_metadata md127: no unique device id for md127: fallback method has no model nor serial' Sep 29 07:13:21 osd-2 bash[13839]: debug 2021-09-29T05:13:21.725+0000 7f25ce737700 1 heartbeat_map reset_timeout 'Monitor::cpu_tp thread 0x7f25ce737700' had timed out after 0.000000000s <repeats several time> Sep 29 07:13:26 osd-2 bash[13839]: debug 2021-09-29T05:13:26.865+0000 7f25cff3a700 1 mon.osd-2@1(peon).osd e491238 _set_new_cache_sizes cache_size:134217728 inc_alloc: 67108864 full_alloc: 67108864 kv_alloc: 67108864 Sep 29 07:13:26 osd-2 bash[13839]: debug 2021-09-29T05:13:26.875+0000 7f25cff3a700 -1 mon.osd-2@1(peon) e11 get_health_metrics reporting 49411 slow ops, oldest is mon_command([{prefix=config-key set, key=mgr/cephadm/host.osd-5}] v 0) --8<-- --8<-- ceph -s output --8<-- --8<-- # ceph -s cluster: id: 55633ec3-6c0c-4a02-990c-0f87e0f7a01f health: HEALTH_WARN 1 clients failing to respond to capability release 1 MDSs report slow requests 1/5 mons down, quorum osd-1,osd-5,osd-4,osd-3 services: mon: 5 daemons, quorum osd-1,osd-2,osd-5,osd-4,osd-3 (age 2s) mgr: osd-2(active, since 13h), standbys: osd-3, osd-5.jcfyqe, osd-1, osd-4.oylrhe mds: 1/1 daemons up, 1 standby osd: 180 osds: 180 up (since 19h), 164 in (since 4d) rgw: 12 daemons active (6 hosts, 2 zones) data: volumes: 1/1 healthy pools: 14 pools, 5504 pgs objects: 263.11M objects, 944 TiB usage: 1.4 PiB used, 659 TiB / 2.0 PiB avail pgs: 5492 active+clean 11 active+clean+scrubbing+deep 1 active+clean+scrubbing io: client: 39 KiB/s rd, 7.0 MiB/s wr, 52 op/s rd, 57 op/s wr --8<-- --8<-- heap stats output --8<-- --8<-- # uname -a Linux osd-2 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux # ceph tell osd.6 heap stats osd.6 tcmalloc heap stats:------------------------------------------------ MALLOC: 2482344872 ( 2367.3 MiB) Bytes in use by application MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist MALLOC: + 201900496 ( 192.5 MiB) Bytes in central cache freelist MALLOC: + 8619008 ( 8.2 MiB) Bytes in transfer cache freelist MALLOC: + 42731144 ( 40.8 MiB) Bytes in thread cache freelists MALLOC: + 15335424 ( 14.6 MiB) Bytes in malloc metadata MALLOC: ------------ MALLOC: = 2750930944 ( 2623.5 MiB) Actual memory used (physical + swap) MALLOC: + 601866240 ( 574.0 MiB) Bytes released to OS (aka unmapped) MALLOC: ------------ MALLOC: = 3352797184 ( 3197.5 MiB) Virtual address space used MALLOC: MALLOC: 165291 Spans in use MALLOC: 40 Thread heaps in use MALLOC: 8192 Tcmalloc page size ------------------------------------------------ Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). Bytes released to the OS take up virtual address space but no physical memory. On Wed, Sep 29, 2021 at 12:16 AM Christian Wuerdig < christian.wuerdig@xxxxxxxxx> wrote: > buff/cache is the Linux kernel buffer and page cache which is > unrelated to the ceph bluestore cache. Check the memory consumption of > your individual OSD processes to confirm. Top also says 132GB > available (since buffers and page cache entries will be dropped > automatically if processes need more RAM) so your output certainly > does not indicate memory pressure (quite the opposite actually). > You can also inspect the memory buffers for individual OSDs with > > ceph daemon osd.X dump_mempools > and > ceph tell osd.X heap stats > > On Wed, 29 Sept 2021 at 09:47, Manuel Holtgrewe <zyklenfrei@xxxxxxxxx> > wrote: > > > > Hello, > > > > I'm running ceph pacific OSD servers that are orchestratedy by cephadm > (on > > docker.io v20.10.8 on CentOS 7.9). The servers are a bit ... low > equipped > > than others when it comes to memory per OSD. > > > > Earlier, we were able to accommodate for this by using the following > > /etc/ceph/ceph.conf setting: > > > > [osd] > > osd memory target = 2147483648 > > > > However, since I have switched to cephadm and pacific (I guess this issue > > is more related to the earlier than the latter), I'm seeing top/htop > output > > that indicates that 50% of my memory is used by processes and the other > 50% > > are used by "buff/cache". > > > > # free -h > > total used free shared buff/cache > > available > > Mem: 251G 105G 24G 1.5G 121G > > 132G > > Swap: 7.8G 40M 7.7G > > > > I'm seeing issues such as mon slowing down and going out of quorum that I > > saw earlier when memory was tight. Thus, I'm assuming that memory is the > > issue here again... > > > > Thanks, > > Manuel > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx