I'm afraid you're simply hitting the I/O limits of your disks. /Z On Thu, 2 Nov 2023 at 03:40, V A Prabha <prabhav@xxxxxxx> wrote: > Hi Eugen > Please find the details below > > > root@meghdootctr1:/var/log/ceph# ceph -s > cluster: > id: c59da971-57d1-43bd-b2b7-865d392412a5 > health: HEALTH_WARN > nodeep-scrub flag(s) set > 544 pgs not deep-scrubbed in time > > services: > mon: 3 daemons, quorum meghdootctr1,meghdootctr2,meghdootctr3 (age 5d) > mgr: meghdootctr1(active, since 5d), standbys: meghdootctr2, meghdootctr3 > mds: 3 up:standby > osd: 36 osds: 36 up (since 34h), 36 in (since 34h) > flags nodeep-scrub > > data: > pools: 2 pools, 544 pgs > objects: 10.14M objects, 39 TiB > usage: 116 TiB used, 63 TiB / 179 TiB avail > pgs: 544 active+clean > > io: > client: 24 MiB/s rd, 16 MiB/s wr, 2.02k op/s rd, 907 op/s wr > > > Ceph Versions: > > root@meghdootctr1:/var/log/ceph# ceph --version > ceph version 14.2.16 (762032d6f509d5e7ee7dc008d80fe9c87086603c) nautilus > (stable) > > Ceph df -h > https://pastebin.com/1ffucyJg > > Ceph OSD performance dump > https://pastebin.com/1R6YQksE > > Ceph tell osd.XX bench (Out of 36 osds only 8 OSDs give High IOPS value > of 250 > +. Out of that 4 OSDs are from HP 3PAR and 4 OSDS from DELL EMC. We are > using > only 4 OSDs from HP3 par and it is working fine without any latency and > iops > issues from the beginning but the remaining 32 OSDs are from DELL EMC in > which 4 > OSDs are much better than the remaining 28 OSDs) > > https://pastebin.com/CixaQmBi > > Please help me to identify if the issue is with the DELL EMC Storage, Ceph > configuration parameter tuning or the Overload in the cloud setup > > > > On November 1, 2023 at 9:48 PM Eugen Block <eblock@xxxxxx> wrote: > > Hi, > > > > for starters please add more cluster details like 'ceph status', 'ceph > > versions', 'ceph osd df tree'. Increasing the to 10G was the right > > thing to do, you don't get far with 1G with real cluster load. How are > > the OSDs configured (HDD only, SSD only or HDD with rocksdb on SSD)? > > How is the disk utilization? > > > > Regards, > > Eugen > > > > Zitat von prabhav@xxxxxxx: > > > > > In a production setup of 36 OSDs( SAS disks) totalling 180 TB > > > allocated to a single Ceph Cluster with 3 monitors and 3 managers. > > > There were 830 volumes and VMs created in Openstack with Ceph as a > > > backend. On Sep 21, users reported slowness in accessing the VMs. > > > Analysing the logs lead us to problem with SAS , Network congestion > > > and Ceph configuration( as all default values were used). We updated > > > the Network from 1Gbps to 10Gbps for public and cluster networking. > > > There was no change. > > > The ceph benchmark performance showed that 28 OSDs out of 36 OSDs > > > reported very low IOPS of 30 to 50 while the remaining showed 300+ > > > IOPS. > > > We gradually started reducing the load on the ceph cluster and now > > > the volumes count is 650. Now the slow operations has gradually > > > reduced but I am aware that this is not the solution. > > > Ceph configuration is updated with increasing the > > > osd_journal_size to 10 GB, > > > osd_max_backfills = 1 > > > osd_recovery_max_active = 1 > > > osd_recovery_op_priority = 1 > > > bluestore_cache_trim_max_skip_pinned=10000 > > > > > > After one month, now we faced another issue with Mgr daemon stopped > > > in all 3 quorums and 16 OSDs went down. From the > > > ceph-mon,ceph-mgr.log could not get the reason. Please guide me as > > > its a production setup > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > Thanks & Regards, > Ms V A Prabha / श्रीमती प्रभा वी ए > Joint Director / संयुक्त निदेशक > Centre for Development of Advanced Computing(C-DAC) / प्रगत संगणन विकास > केन्द्र(सी-डैक) > Tidel Park”, 8th Floor, “D” Block, (North &South) / “टाइडल पार्क”,8वीं > मंजिल, > “डी” ब्लॉक, (उत्तर और दक्षिण) > No.4, Rajiv Gandhi Salai / नं.4, राजीव गांधी सलाई > Taramani / तारामणि > Chennai / चेन्नई – 600113 > Ph.No.:044-22542226/27 > Fax No.: 044-22542294 > > ------------------------------------------------------------------------------------------------------------ > [ C-DAC is on Social-Media too. Kindly follow us at: > Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ] > > This e-mail is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. If you are not the > intended recipient, please contact the sender by reply e-mail and destroy > all copies and the original message. Any unauthorized review, use, > disclosure, dissemination, forwarding, printing or copying of this email > is strictly prohibited and appropriate legal action will be taken. > > ------------------------------------------------------------------------------------------------------------ > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx