Hi Tim, do you see the behaviour across all devices or does it only affect one type/manufacturer? Joachim www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Tim Sauerbein <sauerbein@xxxxxxxxxx> schrieb am So., 29. Sept. 2024, 23:32: > Dear list, > > I have a small cluster (Reef 18.2.4) with 7 hosts and 3-4 OSDs each > (960GB/1.92TB mixed Intel D3-S4610, Samsung SM883, PM897 SSDs): > > cluster: > id: ecff3ce8-539b-443e-a492-da428f4aa9e9 > health: HEALTH_OK > > services: > mon: 5 daemons, quorum titan,mangan,kalium,argon,chromium (age 2w) > mgr: mangan(active, since 2w), standbys: titan, argon > osd: 22 osds: 22 up (since 2w), 22 in (since 3M) > > data: > pools: 2 pools, 513 pgs > objects: 2.76M objects, 7.0 TiB > usage: 16 TiB used, 15 TiB / 31 TiB avail > pgs: 513 active+clean > > On that cluster RBD volumes for virtual machines are stored. > > For a couple of months now the cluster reports slow ops for some OSDs and > some PGs as laggy. This happens once or twice a day, sometimes more and > sometimes not at all for a few days, at completely random times, > independent of when snapshots are deleted and trimmed and independent of > the I/O load or load on the hosts. > > After about 30 seconds, during which the write speed goes to zero on the > VMs, everything returns to normal. I cannot reproduce the slow ops manually > by creating write load on the cluster. Even writing continuously with > 300-400 MB/s full speed for 20 minutes does not create any problems. > > See attached log file for an example of a typical occurrence. I have also > measured write load on the disks during the problems with iostat which just > shows how writes stall, see also attached. > > The OSDs with slow ops are completely random, any of the disks would show > up once in a while. > > Current config (I've tried optimising snaptrim and scrub which didn't > help): > > # ceph config dump > WHO MASK LEVEL OPTION VALUE > RO > global advanced auth_client_required cephx > * > global advanced auth_cluster_required cephx > * > global advanced auth_service_required cephx > * > global advanced bdev_async_discard true > global advanced bdev_enable_discard true > global advanced public_network 10.0.4.0/24 > * > mon advanced auth_allow_insecure_global_id_reclaim false > mgr advanced mgr/balancer/active true > mgr advanced mgr/balancer/mode upmap > mgr unknown mgr/pg_autoscaler/autoscale_profile scale-up > * > osd basic osd_memory_target 4294967296 > osd advanced osd_pg_max_concurrent_snap_trims 1 > osd advanced osd_scrub_begin_hour 23 > osd advanced osd_scrub_end_hour 4 > osd advanced osd_scrub_sleep 1.000000 > osd advanced osd_snap_trim_priority 1 > osd advanced osd_snap_trim_sleep 2.000000 > osd.0 basic osd_mclock_max_capacity_iops_ssd 29199.674019 > osd.1 basic osd_mclock_max_capacity_iops_ssd 31554.530141 > osd.10 basic osd_mclock_max_capacity_iops_ssd 25949.821194 > osd.11 basic osd_mclock_max_capacity_iops_ssd 26300.596265 > osd.12 basic osd_mclock_max_capacity_iops_ssd 25167.331294 > osd.13 basic osd_mclock_max_capacity_iops_ssd 21606.610828 > osd.14 basic osd_mclock_max_capacity_iops_ssd 27894.095121 > osd.15 basic osd_mclock_max_capacity_iops_ssd 25929.047047 > osd.16 basic osd_mclock_max_capacity_iops_ssd 15423.600235 > osd.17 basic osd_mclock_max_capacity_iops_ssd 25097.493934 > osd.18 basic osd_mclock_max_capacity_iops_ssd 25966.188007 > osd.19 basic osd_mclock_max_capacity_iops_ssd 23628.746459 > osd.2 basic osd_mclock_max_capacity_iops_ssd 32157.280832 > osd.20 basic osd_mclock_max_capacity_iops_ssd 22722.682745 > osd.3 basic osd_mclock_max_capacity_iops_ssd 33951.086556 > osd.4 basic osd_mclock_max_capacity_iops_ssd 22736.907664 > osd.5 basic osd_mclock_max_capacity_iops_ssd 21916.777510 > osd.6 basic osd_mclock_max_capacity_iops_ssd 29984.954749 > osd.7 basic osd_mclock_max_capacity_iops_ssd 26757.965797 > osd.8 basic osd_mclock_max_capacity_iops_ssd 22738.921429 > osd.9 basic osd_mclock_max_capacity_iops_ssd 24635.156413 > > > Any help would be much appreciated! > > Thanks, > Tim > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx