Re: Very slow backfilling/remapping of EC pool PGs

Gauvain Pocentek <gauvainpocentek@xxxxxxxxx> · Tue, 21 Mar 2023 14:51:49 +0100

On Tue, Mar 21, 2023 at 2:21 PM Clyso GmbH - Ceph Foundation Member <
joachim.kraftmayer@xxxxxxxxx> wrote:

>
>
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_op_queue
>

Since this requires a restart I went an other way to speed up the recovery
of degraded PGs and avoid weirdness while restarting the OSDs. I've
increased the value of osd_mclock_max_capacity_iops_hdd to a ridiculous
number for spinning disks (6000). The effect is not magical but the
recovery went from 4 to 60 objects/s. Ceph should be back to normal in a
few hours.

I will change the osd_op_queue value once the cluster is stable.

Thanks for the help, it's been really useful, and I know a little bit more
about Ceph :)

Gauvain

> ___________________________________
> Clyso GmbH - Ceph Foundation Member
>
> Am 21.03.23 um 12:51 schrieb Gauvain Pocentek:
>
> (adding back the list)
>
> On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer <
> joachim.kraftmayer@xxxxxxxxx> wrote:
>
>> i added the questions and answers below.
>>
>> ___________________________________
>> Best Regards,
>> Joachim Kraftmayer
>> CEO | Clyso GmbH
>>
>> Clyso GmbH
>> p: +49 89 21 55 23 91 2
>> a: Loristraße 8 | 80335 München | Germany
>> w: https://clyso.com | e: joachim.kraftmayer@xxxxxxxxx
>>
>> We are hiring: https://www.clyso.com/jobs/
>> ---
>> CEO: Dipl. Inf. (FH) Joachim Kraftmayer
>> Unternehmenssitz: Utting am Ammersee
>> Handelsregister beim Amtsgericht: Augsburg
>> Handelsregister-Nummer: HRB 25866
>> USt. ID-Nr.: DE275430677
>>
>> Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
>>
>> Hi Joachim,
>>
>>
>> On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer <
>> joachim.kraftmayer@xxxxxxxxx> wrote:
>>
>>> Which Ceph version are you running, is mclock active?
>>>
>>>
>> We're using Quincy (17.2.5), upgraded step by step from Luminous if I
>> remember correctly.
>>
>> did you recreate the osds? if yes, at which version?
>>
>
> I actually don't remember all the history, but I think we added the HDD
> nodes while running Pacific.
>
>
>
>>
>> mlock seems active, set to high_client_ops profile. HDD OSDs have very
>> different settings for max capacity iops:
>>
>> osd.137        basic     osd_mclock_max_capacity_iops_hdd
>>  929.763899
>> osd.161        basic     osd_mclock_max_capacity_iops_hdd
>>  4754.250946
>> osd.222        basic     osd_mclock_max_capacity_iops_hdd
>>  540.016984
>> osd.281        basic     osd_mclock_max_capacity_iops_hdd
>>  1029.193945
>> osd.282        basic     osd_mclock_max_capacity_iops_hdd
>>  1061.762870
>> osd.283        basic     osd_mclock_max_capacity_iops_hdd
>>  462.984562
>>
>> We haven't set those explicitly, could they be the reason of the slow
>> recovery?
>>
>> i recommend to disable mclock for now, and yes we have seen slow recovery
>> caused by mclock.
>>
>
> Stupid question: how do you do that? I've looked through the docs but
> could only find information about changing the settings.
>
>
>>
>>
>> Bonus question: does ceph set that itself?
>>
>> yes and if you have a setup with HDD + SSD (db & wal) the discovery works
>> not in the right way.
>>
>
> Good to know!
>
>
> Gauvain
>
>
>>
>> Thanks!
>>
>> Gauvain
>>
>>
>>
>>
>>> Joachim
>>>
>>> ___________________________________
>>> Clyso GmbH - Ceph Foundation Member
>>>
>>> Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
>>> > Hello all,
>>> >
>>> > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for WAL/DB.
>>> This
>>> > pool has 9 servers with each 12 disks of 16TBs. About 10 days ago we
>>> lost a
>>> > server and we've removed its OSDs from the cluster. Ceph has started to
>>> > remap and backfill as expected, but the process has been getting
>>> slower and
>>> > slower. Today the recovery rate is around 12 MiB/s and 10 objects/s.
>>> All
>>> > the remaining unclean PGs are backfilling:
>>> >
>>> >    data:
>>> >      volumes: 1/1 healthy
>>> >      pools:   14 pools, 14497 pgs
>>> >      objects: 192.38M objects, 380 TiB
>>> >      usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
>>> >      pgs:     771559/1065561630 objects degraded (0.072%)
>>> >               1215899/1065561630 objects misplaced (0.114%)
>>> >               14428 active+clean
>>> >               50    active+undersized+degraded+remapped+backfilling
>>> >               18    active+remapped+backfilling
>>> >               1     active+clean+scrubbing+deep
>>> >
>>> > We've checked the health of the remaining servers, and everything looks
>>> > like (CPU/RAM/network/disks).
>>> >
>>> > Any hints on what could be happening?
>>> >
>>> > Thank you,
>>> > Gauvain
>>> > _______________________________________________
>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx