Hi, here I describe 1 of the 2 major issues I'm currently facing in my 8 node ceph cluster (2x MDS, 6x ODS). The issue is that I cannot start any virtual machine KVM or container LXC; the boot process just hangs after a few seconds. All these KVMs and LXCs have in common that their virtual disks reside in the same pool: hdd This pool hdd is relatively small compared to the largest pool: hdb_backup root@ld3955:~# rados df POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR backup 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B hdb_backup 589 TiB 51262212 0 153786636 0 0 124895 12266095 4.3 TiB 247132863 463 TiB 0 B 0 B hdd 3.2 TiB 281884 6568 845652 0 0 1658 275277357 16 TiB 208213922 10 TiB 0 B 0 B pve_cephfs_data 955 GiB 91832 0 275496 0 0 3038 2103 1021 MiB 102170 318 GiB 0 B 0 B pve_cephfs_metadata 486 MiB 62 0 186 0 0 7 860 1.4 GiB 12393 166 MiB 0 B 0 B total_objects 51635990 total_used 597 TiB total_avail 522 TiB total_space 1.1 PiB This is the current health status of the ceph cluster: cluster: id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae health: HEALTH_ERR 1 filesystem is degraded 1 MDSs report slow metadata IOs 1 backfillfull osd(s) 87 nearfull osd(s) 1 pool(s) backfillfull Reduced data availability: 54 pgs inactive, 47 pgs peering, 1 pg stale Degraded data redundancy: 129598/154907946 objects degraded (0.084%), 33 pgs degraded, 33 pgs undersized Degraded data redundancy (low space): 322 pgs backfill_toofull 1 subtrees have overcommitted pool target_size_bytes 1 subtrees have overcommitted pool target_size_ratio 1 pools have too many placement groups 21 slow requests are blocked > 32 sec services: mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 14h) mgr: ld5507(active, since 16h), standbys: ld5506, ld5505 mds: pve_cephfs:1/1 {0=ld3955=up:replay} 1 up:standby osd: 360 osds: 356 up, 356 in; 382 remapped pgs data: pools: 5 pools, 8868 pgs objects: 51.64M objects, 197 TiB usage: 597 TiB used, 522 TiB / 1.1 PiB avail pgs: 0.056% pgs unknown 0.553% pgs not active 129598/154907946 objects degraded (0.084%) 2211119/154907946 objects misplaced (1.427%) 8458 active+clean 298 active+remapped+backfill_toofull 29 remapped+peering 24 active+undersized+degraded+remapped+backfill_toofull 22 active+remapped+backfill_wait 17 peering 5 unknown 5 active+recovery_wait+undersized+degraded+remapped 3 active+undersized+degraded+remapped+backfill_wait 2 activating+remapped 1 active+clean+remapped 1 stale+peering 1 active+remapped+backfilling 1 active+recovering+undersized+remapped 1 active+recovery_wait+degraded io: client: 9.2 KiB/s wr, 0 op/s rd, 1 op/s wr I believe the cluster is busy with rebalancing pool hdb_backup. I set the balance mode upmap recently after the 589TB data was written. root@ld3955:~# ceph balancer status { "active": true, "plans": [], "mode": "upmap" } In order to resolve the issue with pool hdd I started some investigation. First step was to install drivers for the NIC provided Mellanox. Then I configured some kernel parameters recommended <https://community.mellanox.com/s/article/linux-sysctl-tuning> by Mellanox. However this didn't fix the issue. In my opinion I must get rid of all "slow requests are blocked". When I check the output of ceph health detail any OSD listed under REQUEST_SLOW points to an OSD that belongs to pool hdd. This means none of the disks belonging to pool hdb_backup is showing a comparable behaviour. Then I checked the running processes on the different OSD nodes; I use tool "glances" here. Here I can see single processes that are running for hours and consuming much CPU, e.g. 66.8 0.2 2.13G 1.17G 1192756 ceph 17h8:33 58 0 S 41M 2K /usr/bin/ceph-osd -f --cluster ceph --id 37 --setuser ceph --setgroup ceph 34.2 0.2 4.31G 1.20G 971267 ceph 15h38:46 58 0 S 14M 3K /usr/bin/ceph-osd -f --cluster ceph --id 73 --setuser ceph --setgroup ceph Similar processes are running on 4 OSD nodes. All processes have in common that the relevant OSD belongs to pool hdd. Furthermore glances gives me this alert: CRITICAL on CPU_IOWAIT (Min:1.9 Mean:2.3 Max:2.6): ceph-osd, ceph-osd, ceph-osd What can / should I do now? Kill the long running processes? Stop the relevant OSDs? Please advise? THX Thomas _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx