Take hints from this: "544 pgs not deep-scrubbed in time". Your OSDs are unable to scrub their data in time, likely because they cannot cope with the client + scrubbing I/O. I.e. there's too much data on too few and too slow spindles. You can play with osd_deep_scrub_interval and increase the scrub interval from the default 604800 seconds (1 week) to 1209600 (2 weeks) or more. It may be also a good idea to manually force scrubbing of some PGs to spread scrubbing time more evenly over the selected period. But in general this is not a balanced setup and little can be done to alleviate the lack of spindle performance. /Z On Wed, 8 Nov 2023 at 17:22, <prabhav@xxxxxxx> wrote: > Hi Eugen > Please find the details below > > root@meghdootctr1:/var/log/ceph# ceph -s > cluster: > id: c59da971-57d1-43bd-b2b7-865d392412a5 > health: HEALTH_WARN > nodeep-scrub flag(s) set > 544 pgs not deep-scrubbed in time > > services: > mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d) > mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3 > mds: 3 up:standby > osd: 36 osds: 36 up (since 34h), 36 in (since 34h) > flags nodeep-scrub > > data: > pools: 2 pools, 544 pgs > objects: 10.14M objects, 39 TiB > usage: 116 TiB used, 63 TiB / 179 TiB avail > pgs: 544 active+clean > > io: > client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr > > > Ceph Versions: > root@meghdootctr1:/var/log/ceph# ceph --version > ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus > (stable) > > Ceph df -h > https://pastebin.com/1ffucyJg > > Ceph OSD performance dump > https://pastebin.com/1R6YQksE > > Ceph tell osd.XX bench (Out of 36 osds only 8 OSDs give High IOPS value > of 250 +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We > are using only 4 OSDs from HP3 par and it is working fine without any > latency and iops issues from the beginning but the remaining 32 OSDs are > from DELL EMC in which 4 OSDs are much better than the remaining 28 OSDs) > > https://pastebin.com/CixaQmBi > > Please help me to identify if the issue is with the DELL EMC Storage, Ceph > configuration parameter tuning or the Overload in the cloud setup > > > > On November 1, 2023 at 9:48 PM Eugen Block <eblock@xxxxxx> wrote: > > Hi, > > > > for starters please add more cluster details like 'ceph status', 'ceph > > versions', 'ceph osd df tree'. Increasing the to 10G was the right > > thing to do, you don't get far with 1G with real cluster load. How are > > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)? > > How is the disk utilization? > > > > Regards, > > Eugen > > > > Zitat von prabhav@xxxxxxx: > > > > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB > > > allocated to a single Ceph Cluster with 3 monitors and 3 managers. > > > There were 830 volumes and VMs created in Openstack with Ceph as a > > > backend. On Sep 21, users reported slowness in accessing the VMs. > > > Analysing the logs lead us to problem with SAS , Network congestion > > > and Ceph configuration( as all default values were used). We updated > > > the Network from 1Gbps to 10Gbps for public and cluster networking. > > > There was no change. > > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs > > > reported very low IOPS of 30 to 50 while the remaining showed 300+ > > > IOPS. > > > We gradually started reducing the load on the ceph cluster and now > > > the volumes count is 650. Now the slow operations has gradually > > > reduced but I am aware that this is not the solution. > > > Ceph configuration is updated with increasing the > > > osd_journal_size to 10 GB, > > > osd_max_backfills = 1 > > > osd_recovery_max_active = 1 > > > osd_recovery_op_priority = 1 > > > bluestore_cache_trim_max_skip_pinned=10000 > > > > > > After one month, now we faced another issue with Mgr daemon stopped > > > in all 3 quorums and 16 OSDs went down. From the > > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as > > > its a production setup > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx