I would not host multiple OSD on a spinning drive (unless it's one of those Seagate MACH.2 drives that have two independent heads) - head seek time will most likely kill performance. The main reason to host multiple OSD on a single SSD or NVME is typically to make use of the large IOPS capacity which cepth can struggle to fully utilize on a single drive. With spinners you usually don't have that "problem" (quite the opposite usually) On Wed, 23 Mar 2022 at 19:29, Boris Behrens <bb@xxxxxxxxx> wrote: > Good morning Istvan, > those are rotating disks and we don't use EC. Splitting up the 16TB disks > into two 8TB partitions and have two OSDs on one disk also sounds > interesting, but would it solve the problem? > > I also thought to adjust the PGs for the data pool from 4096 to 8192. But I > am not sure if this will solve the problem or make it worse. > > Until now, everything I've tried didn't work. > > Am Mi., 23. März 2022 um 05:10 Uhr schrieb Szabo, Istvan (Agoda) < > Istvan.Szabo@xxxxxxxxx>: > > > Hi, > > > > I think you are having similar issue as me in the past. > > > > I have 1.6B objects on a cluster average 40k and all my osd had spilled > > over. > > > > Also slow ops, wrongly marked down… > > > > My osds are 15.3TB ssds, so my solution was to store block+db together on > > the ssds, put 4 osd/ssd and go up to 100pg/osd so 1 disk holds 400pg > approx. > > Also turned on balancer with upmap and max deviation 1. > > > > I’m using ec 4:2, let’s see how long it lasts. My bottleneck is always > the > > pg number, too small pg number for too many objects. > > > > Istvan Szabo > > Senior Infrastructure Engineer > > --------------------------------------------------- > > Agoda Services Co., Ltd. > > e: istvan.szabo@xxxxxxxxx > > --------------------------------------------------- > > > > On 2022. Mar 22., at 23:34, Boris Behrens <bb@xxxxxxxxx> wrote: > > > > Email received from the internet. If in doubt, don't click any link nor > > open any attachment ! > > ________________________________ > > > > The number 180 PGs is because of the 16TB disks. 3/4 of our OSDs had > cache > > SSDs (not nvme though and most of them are 10OSDs one SSD) but this > problem > > only came in with octopus. > > > > We also thought this might be the db compactation, but it doesn't match > up. > > It might happen when the compactation run, but it looks also that it > > happens, when there are other operations like table_file_deletion > > and it happens on OSDs that have SSD backed block.db devices (like 5 OSDs > > share one SAMSUNG MZ7KM1T9HAJM-00005 and the IOPS/throughput on the SSD > is > > not huge (100IOPS r/s 300IOPS w/s when compacting an OSD on it, and > around > > 50mb/s r/w throughput) > > > > I also can not reproduce it via "ceph tell osd.NN compact", so I am not > > 100% sure it is the compactation. > > > > What do you mean with "grep for latency string"? > > > > Cheers > > Boris > > > > Am Di., 22. März 2022 um 15:53 Uhr schrieb Konstantin Shalygin < > > k0ste@xxxxxxxx>: > > > > 180PG per OSD is usually overhead, also 40k obj per PG is not much, but I > > > > don't think this will works without block.db NVMe. I think your "wrong > out > > > > marks" evulate in time of rocksdb compaction. With default log settings > you > > > > can try to grep 'latency' strings > > > > > > Also, https://tracker.ceph.com/issues/50297 > > > > > > > > k > > > > Sent from my iPhone > > > > > > On 22 Mar 2022, at 14:29, Boris Behrens <bb@xxxxxxxxx> wrote: > > > > > > * the 8TB disks hold around 80-90 PGs (16TB around 160-180) > > > > * per PG we've around 40k objects 170m objects in 1.2PiB of storage > > > > > > > > > > -- > > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > > groüen Saal. > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > ------------------------------ > > This message is confidential and is for the sole use of the intended > > recipient(s). It may also be privileged or otherwise protected by > copyright > > or other legal rules. If you have received it by mistake please let us > know > > by reply email and delete it from your system. It is prohibited to copy > > this message or disclose its content to anyone. Any confidentiality or > > privilege is not waived or lost by any mistaken delivery or unauthorized > > disclosure of the message. All messages sent to and from Agoda may be > > monitored to ensure compliance with company policies, to protect the > > company's interests and to remove potential malware. Electronic messages > > may be intercepted, amended, lost or deleted, or contain viruses. > > > > > -- > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im > groüen Saal. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx