Re: Very slow backfilling/remapping of EC pool PGs

Clyso GmbH - Ceph Foundation Member <joachim.kraftmayer@xxxxxxxxx> · Tue, 21 Mar 2023 14:21:42 +0100

https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_op_queue

___________________________________
Clyso GmbH - Ceph Foundation Member

Am 21.03.23 um 12:51 schrieb Gauvain Pocentek:
(adding back the list)

On Tue, Mar 21, 2023 at 11:25 AM Joachim Kraftmayer 
<joachim.kraftmayer@xxxxxxxxx> wrote:

    i added the questions and answers below.

    ___________________________________
    Best Regards,
    Joachim Kraftmayer
    CEO | Clyso GmbH

    Clyso GmbH
    p: +49 89 21 55 23 91 2
    a: Loristraße 8 | 80335 München | Germany
    w:https://clyso.com  | e:joachim.kraftmayer@xxxxxxxxx

    We are hiring:https://www.clyso.com/jobs/
    ---
    CEO: Dipl. Inf. (FH) Joachim Kraftmayer
    Unternehmenssitz: Utting am Ammersee
    Handelsregister beim Amtsgericht: Augsburg
    Handelsregister-Nummer: HRB 25866
    USt. ID-Nr.: DE275430677

    Am 21.03.23 um 11:14 schrieb Gauvain Pocentek:
    Hi Joachim,

    On Tue, Mar 21, 2023 at 10:13 AM Joachim Kraftmayer
    <joachim.kraftmayer@xxxxxxxxx> wrote:

        Which Ceph version are you running, is mclock active?

    We're using Quincy (17.2.5), upgraded step by step from Luminous
    if I remember correctly.
    did you recreate the osds? if yes, at which version?

I actually don't remember all the history, but I think we added the 
HDD nodes while running Pacific.

    mlock seems active, set to high_client_ops profile. HDD OSDs have
    very different settings for max capacity iops:

    osd.137        basic osd_mclock_max_capacity_iops_hdd  929.763899
    osd.161        basic osd_mclock_max_capacity_iops_hdd  4754.250946
    osd.222        basic osd_mclock_max_capacity_iops_hdd  540.016984
    osd.281        basic osd_mclock_max_capacity_iops_hdd  1029.193945
    osd.282        basic osd_mclock_max_capacity_iops_hdd  1061.762870
    osd.283        basic osd_mclock_max_capacity_iops_hdd  462.984562

    We haven't set those explicitly, could they be the reason of the
    slow recovery?

    i recommend to disable mclock for now, and yes we have seen slow
    recovery caused by mclock.

Stupid question: how do you do that? I've looked through the docs but 
could only find information about changing the settings.

    Bonus question: does ceph set that itself?
    yes and if you have a setup with HDD + SSD (db & wal) the
    discovery works not in the right way.

Good to know!

Gauvain

    Thanks!

    Gauvain

        Joachim

        ___________________________________
        Clyso GmbH - Ceph Foundation Member

        Am 21.03.23 um 06:53 schrieb Gauvain Pocentek:
        > Hello all,
        >
        > We have an EC (4+2) pool for RGW data, with HDDs + SSDs for
        WAL/DB. This
        > pool has 9 servers with each 12 disks of 16TBs. About 10
        days ago we lost a
        > server and we've removed its OSDs from the cluster. Ceph
        has started to
        > remap and backfill as expected, but the process has been
        getting slower and
        > slower. Today the recovery rate is around 12 MiB/s and 10
        objects/s. All
        > the remaining unclean PGs are backfilling:
        >
        >    data:
        >      volumes: 1/1 healthy
        >      pools:   14 pools, 14497 pgs
        >      objects: 192.38M objects, 380 TiB
        >      usage:   764 TiB used, 1.3 PiB / 2.1 PiB avail
        >      pgs:     771559/1065561630 objects degraded (0.072%)
        >               1215899/1065561630 objects misplaced (0.114%)
        >               14428 active+clean
        >               50
        active+undersized+degraded+remapped+backfilling
        >               18 active+remapped+backfilling
        >               1  active+clean+scrubbing+deep
        >
        > We've checked the health of the remaining servers, and
        everything looks
        > like (CPU/RAM/network/disks).
        >
        > Any hints on what could be happening?
        >
        > Thank you,
        > Gauvain
        > _______________________________________________
        > ceph-users mailing list -- ceph-users@xxxxxxx
        > To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx