https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_op_queue
___________________________________
Clyso GmbH - Ceph Foundation Member
Am 21.03.23 um 12:51 schrieb Gauvain Pocentek:
(adding back the list)
On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer
<joachim.kraftmayer@xxxxxxxxx> wrote:
i added the questions and answers below.
___________________________________
Best Regards,
Joachim Kraftmayer
CEO | Clyso GmbH
Clyso GmbH
p: +49 89 21 55 23 91 2
a: Loristraße 8 | 80335 München | Germany
w:https://clyso.com | e:joachim.kraftmayer@xxxxxxxxx
We are hiring:https://www.clyso.com/jobs/
---
CEO: Dipl. Inf. (FH) Joachim Kraftmayer
Unternehmenssitz: Utting am Ammersee
Handelsregister beim Amtsgericht: Augsburg
Handelsregister-Nummer: HRB 25866
USt. ID-Nr.: DE275430677
Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
Hi Joachim,
On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer
<joachim.kraftmayer@xxxxxxxxx> wrote:
Which Ceph version are you running, is mclock active?
We're using Quincy (17.2.5), upgraded step by step from Luminous
if I remember correctly.
did you recreate the osds? if yes, at which version?
I actually don't remember all the history, but I think we added the
HDD nodes while running Pacific.
mlock seems active, set to high_client_ops profile. HDD OSDs have
very different settings for max capacity iops:
osd.137 basic osd_mclock_max_capacity_iops_hdd 929.763899
osd.161 basic osd_mclock_max_capacity_iops_hdd 4754.250946
osd.222 basic osd_mclock_max_capacity_iops_hdd 540.016984
osd.281 basic osd_mclock_max_capacity_iops_hdd 1029.193945
osd.282 basic osd_mclock_max_capacity_iops_hdd 1061.762870
osd.283 basic osd_mclock_max_capacity_iops_hdd 462.984562
We haven't set those explicitly, could they be the reason of the
slow recovery?
i recommend to disable mclock for now, and yes we have seen slow
recovery caused by mclock.
Stupid question: how do you do that? I've looked through the docs but
could only find information about changing the settings.
Bonus question: does ceph set that itself?
yes and if you have a setup with HDD + SSD (db & wal) the
discovery works not in the right way.
Good to know!
Gauvain
Thanks!
Gauvain
Joachim
___________________________________
Clyso GmbH - Ceph Foundation Member
Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
> Hello all,
>
> We have an EC (4+2) pool for RGW data, with HDDs + SSDs for
WAL/DB. This
> pool has 9 servers with each 12 disks of 16TBs. About 10
days ago we lost a
> server and we've removed its OSDs from the cluster. Ceph
has started to
> remap and backfill as expected, but the process has been
getting slower and
> slower. Today the recovery rate is around 12 MiB/s and 10
objects/s. All
> the remaining unclean PGs are backfilling:
>
> data:
> volumes: 1/1 healthy
> pools: 14 pools, 14497 pgs
> objects: 192.38M objects, 380 TiB
> usage: 764 TiB used, 1.3 PiB / 2.1 PiB avail
> pgs: 771559/1065561630 objects degraded (0.072%)
> 1215899/1065561630 objects misplaced (0.114%)
> 14428 active+clean
> 50
active+undersized+degraded+remapped+backfilling
> 18 active+remapped+backfilling
> 1 active+clean+scrubbing+deep
>
> We've checked the health of the remaining servers, and
everything looks
> like (CPU/RAM/network/disks).
>
> Any hints on what could be happening?
>
> Thank you,
> Gauvain
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx