Re: 1 PG remains remapped after recovery

Frank Schilder <frans@xxxxxx> · Sat, 27 Aug 2022 17:55:49 +0000

Hi Tyler,

right now, the EC rule looks like this:

rule fs-data {
        id 1
        type erasure
        min_size 3
        max_size 6
        step set_chooseleaf_tries 5
        step set_choose_tries 100
        step take default
        step choose indep 0 type osd
        step emit
}

I guess you are indicating it might be this: https://docs.ceph.com/en/octopus/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon ? Its a PG on a 4+2 EC pool, failure domain OSD. We have 3 hosts with 3 OSDs each, so 9 in total. Now 8 are up+in. According to the description, we don't have the situation of k+m=#ODSs. Is this the issue you have in mind or is it something else?

I think crush should be able to find a mapping. I did tests with the OSDs after cluster creation, stopping an OSD, setting it down+out and it worked fine. Only difference now is I have data and load on it. I have seen something like this before on our mimic cluster. 1 PG can get stuck in an unclean state for a very long time for no apparent reason. I asked about this in the group, but never got a reply.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Tyler Stachecki <stachecki.tyler@xxxxxxxxx>
Sent: 27 August 2022 19:27:36
To: Frank Schilder
Cc: ceph-users
Subject: Re:  1 PG remains remapped after recovery

You seem to have an OSD that's down and out (status says 9 OSDs, 8 up and in). One possibility is that the pg is not able to fully recover because of existing CRUSH rules and the virtue that the only OSD that could store the last replica is down and out.

So, what do your CRUSH rules and replication look like?

Tyler

On Sat, Aug 27, 2022, 1:20 PM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Hi all,

our test cluster (octopus 15.2.16) ended up in a weird state:

  cluster:
    id:     bf1f51f5-b381-4cf7-b3db-88d044c1960c
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum tceph-01,tceph-03,tceph-02 (age 4w)
    mgr: tceph-01(active, since 4w), standbys: tceph-02, tceph-03
    mds: fs:1 {0=tceph-02=up:active} 2 up:standby
    osd: 9 osds: 8 up (since 29h), 8 in (since 28h); 1 remapped pgs

  data:
    pools:   4 pools, 321 pgs
    objects: 10.40M objects, 348 GiB
    usage:   1.7 TiB used, 442 GiB / 2.2 TiB avail
    pgs:     39434/46694661 objects misplaced (0.084%)
             205 active+clean+snaptrim_wait
             99  active+clean
             16  active+clean+snaptrim
             1   active+clean+remapped+snaptrim_wait

  io:
    client:   19 KiB/s rd, 22 MiB/s wr, 2 op/s rd, 174 op/s wr

As part of the testing we failed an OSD to benchmark client IO under recovery. Strangely enough, after the cluster recovered, 1 PG remains in state remapped. Despite that, health is OK. This seems problematic, because the PG will probably accumulate PG_LOG entries until the remapped state is cleared. The history versions look already wildly different. Here the full PG state:

PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES       OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG  STATE                                STATE_STAMP                      VERSION       REPORTED      UP                UP_PRIMARY  ACTING         ACTING_PRIMARY  LAST_SCRUB    SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
4.1c       39438                   0         0      39438        0  2825704691            0           0  1933      1933  active+clean+remapped+snaptrim_wait  2022-08-27T19:05:15.144083+0200  4170'3108053  4170:3022415  [6,1,4,5,3,NONE]           6  [6,1,4,5,3,1]               6  3312'2843531  2022-08-24T22:40:42.482024+0200     2832'2067159  2022-08-21T02:13:17.023702+0200             49

Any ideas why this PG is stuck in remapped and does not rebalance objects? Is there a way to convince it to start rebalancing?

Thanks and Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx