Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

Boris Behrens <bb@xxxxxxxxx> · Wed, 23 Mar 2022 07:28:58 +0100

Good morning Istvan,
those are rotating disks and we don't use EC. Splitting up the 16TB disks
into two 8TB partitions and have two OSDs on one disk also sounds
interesting, but would it solve the problem?

I also thought to adjust the PGs for the data pool from 4096 to 8192. But I
am not sure if this will solve the problem or make it worse.

Until now, everything I've tried didn't work.

Am Mi., 23. März 2022 um 05:10 Uhr schrieb Szabo, Istvan (Agoda) <
Istvan.Szabo@xxxxxxxxx>:

> Hi,
>
> I think you are having similar issue as me in the past.
>
> I have 1.6B objects on a cluster average 40k and all my osd had spilled
> over.
>
> Also slow ops, wrongly marked down…
>
> My osds are 15.3TB ssds, so my solution was to store block+db together on
> the ssds, put 4 osd/ssd and go up to 100pg/osd so 1 disk holds 400pg approx.
> Also turned on balancer with upmap and max deviation 1.
>
> I’m using ec 4:2, let’s see how long it lasts. My bottleneck is always the
> pg number, too small pg number for too many objects.
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---------------------------------------------------
> Agoda Services Co., Ltd.
> e: istvan.szabo@xxxxxxxxx
> ---------------------------------------------------
>
> On 2022. Mar 22., at 23:34, Boris Behrens <bb@xxxxxxxxx> wrote:
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> ________________________________
>
> The number 180 PGs is because of the 16TB disks. 3/4 of our OSDs had cache
> SSDs (not nvme though and most of them are 10OSDs one SSD) but this problem
> only came in with octopus.
>
> We also thought this might be the db compactation, but it doesn't match up.
> It might happen when the compactation run, but it looks also that it
> happens, when there are other operations like table_file_deletion
> and it happens on OSDs that have SSD backed block.db devices (like 5 OSDs
> share one SAMSUNG MZ7KM1T9HAJM-00005 and the IOPS/throughput on the SSD is
> not huge (100IOPS r/s 300IOPS w/s when compacting an OSD on it, and around
> 50mb/s r/w throughput)
>
> I also can not reproduce it via "ceph tell osd.NN compact", so I am not
> 100% sure it is the compactation.
>
> What do you mean with "grep for latency string"?
>
> Cheers
> Boris
>
> Am Di., 22. März 2022 um 15:53 Uhr schrieb Konstantin Shalygin <
> k0ste@xxxxxxxx>:
>
> 180PG per OSD is usually overhead, also 40k obj per PG is not much, but I
>
> don't think this will works without block.db NVMe. I think your "wrong out
>
> marks" evulate in time of rocksdb compaction. With default log settings you
>
> can try to grep 'latency' strings
>
>
> Also, https://tracker.ceph.com/issues/50297
>
>
>
> k
>
> Sent from my iPhone
>
>
> On 22 Mar 2022, at 14:29, Boris Behrens <bb@xxxxxxxxx> wrote:
>
>
> * the 8TB disks hold around 80-90 PGs (16TB around 160-180)
>
> * per PG we've around 40k objects 170m objects in 1.2PiB of storage
>
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groÃƒ¼en Saal.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
> ------------------------------
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>

-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groÃƒ¼en Saal.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx