Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

Denis Polom <denispolom@xxxxxxxxx> · Fri, 20 May 2022 20:18:13 +0200

Hi,

no pool is EC.

Primary affinity works in Octopus on replicated pool.

Nautilus EC pool works.

On 5/20/22 19:25, denispolom@xxxxxxxxx wrote:
Hi,

no pool is EC.

20. 5. 2022 18:19:22 Dan van der Ster <dvanders@xxxxxxxxx>:

    Hi,

    Just a curiosity... It looks like you're comparing an EC pool in
    octopus to a replicated pool in nautilus. Does primary affinity
    work for you in octopus on a replicated pool? And does a nautilus
    EC pool work?

    .. Dan

    On Fri., May 20, 2022, 13:53 Denis Polom, <denispolom@xxxxxxxxx>
    wrote:

        Hi

        I observed high latencies and mount points hanging since
        Octopus release
        and it's still observed on Pacific latest while draining OSD.

        Cluster setup:

        Ceph Pacific 16.2.7

        Cephfs with EC data pool

        EC profile setup:

        crush-device-class=
        crush-failure-domain=host
        crush-root=default
        jerasure-per-chunk-alignment=false
        k=10
        m=2
        plugin=jerasure
        technique=reed_sol_van
        w=8

        Description:

        If we have broken drive, we are removing it from Ceph cluster by
        draining it first. That means changing its crush weight to 0

        ceph osd crush reweight osd.1 0

        Normally on Nautilus it didn't affected clients. But after
        upgrade to
        Octopus (and since Octopus till current Pacific release) I can
        observe
        very high IO latencies on clients while OSD being drained
        (10sec and
        higher).

        By debugging I found out that drained OSD is still listed as
        ACTIVE_PRIMARY and that happens only on EC pools and only
        since Octopus.
        I tested it back on Nautilus, to be sure, where behavior is
        correct and
        drained OSD is not listed under UP and ACTIVE OSDs for PGs.

        Even if setting up primary-affinity for given OSD to 0 this
        doesn't have
        any effect on EC pool.

        Bellow are my debugs:

        Buggy behavior on Octopus and Pacific:

        Before draining osd.70:

        PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND
        BYTES       OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
        STATE                          STATE_STAMP VERSION
        REPORTED           UP UP_PRIMARY ACTING
        ACTING_PRIMARY LAST_SCRUB         SCRUB_STAMP LAST_DEEP_SCRUB
        DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
        16.1fff     2269                   0         0          0 0
        8955297727            0           0  2449 2449
        active+clean 2022-05-19T08:41:55.241734+0200 19403690'275685
        19407588:19607199    [70,206,216,375,307,57]          70
        [70,206,216,375,307,57]              70    19384365'275621
        2022-05-19T08:41:55.241493+0200    19384365'275621
        2022-05-19T08:41:55.241493+0200              0
        dumped pgs

        after setting osd.70 crush weight to 0 (osd.70 is still acting
        primary):

          UP                         UP_PRIMARY ACTING
        ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP
        LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
        16.1fff     2269                   0         0       2269 0
        8955297727            0           0  2449      2449
        active+remapped+backfill_wait 2022-05-20T08:51:54.249071+0200
        19403690'275685  19407668:19607289
        [71,206,216,375,307,57]          71
        [70,206,216,375,307,57]              70    19384365'275621
        2022-05-19T08:41:55.241493+0200    19384365'275621
        2022-05-19T08:41:55.241493+0200              0
        dumped pgs

        Correct behavior on Nautilus:

        Before draining osd.10:

        PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND
        BYTES
        OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP
        VERSION REPORTED UP         UP_PRIMARY ACTING ACTING_PRIMARY
        LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
        SNAPTRIMQ_LEN
        2.4e          2                  0        0 0       0
        8388608           0          0   2        2 active+clean
        2022-05-20
        02:13:47.432104    61'2    75:40   [10,0,7] 10 [10,0,7]
        10        0'0 2022-05-20 01:44:36.217286             0'0
        2022-05-20
        01:44:36.217286             0

        after setting osd.10 crush weight to 0 (behavior is correct,
        osd.10 is
        not listed, not used):

        root@nautilus1:~# ceph pg dump pgs | head -2
        PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND
        BYTES
        OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
        STATE_STAMP                VERSION REPORTED UP UP_PRIMARY
        ACTING     ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
        LAST_DEEP_SCRUB DEEP_SCRUB_STAMP           SNAPTRIMQ_LEN
        2.4e         14                  0        0 0       0
        58720256           0          0  18       18 active+clean
        2022-05-20
        02:18:59.414812   75'18    80:43 [22,0,7]         22
        [22,0,7]             22        0'0 2022-05-20
        01:44:36.217286             0'0 2022-05-20
        01:44:36.217286             0

        Now question is if is it some implemented feature?

        Or is it a bug?

        Thank you!

        _______________________________________________
        ceph-users mailing list -- ceph-users@xxxxxxx
        To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx