You seem to have an OSD that's down and out (status says 9 OSDs, 8 up and in). One possibility is that the pg is not able to fully recover because of existing CRUSH rules and the virtue that the only OSD that could store the last replica is down and out. So, what do your CRUSH rules and replication look like? Tyler On Sat, Aug 27, 2022, 1:20 PM Frank Schilder <frans@xxxxxx> wrote: > Hi all, > > our test cluster (octopus 15.2.16) ended up in a weird state: > > cluster: > id: bf1f51f5-b381-4cf7-b3db-88d044c1960c > health: HEALTH_OK > > services: > mon: 3 daemons, quorum tceph-01,tceph-03,tceph-02 (age 4w) > mgr: tceph-01(active, since 4w), standbys: tceph-02, tceph-03 > mds: fs:1 {0=tceph-02=up:active} 2 up:standby > osd: 9 osds: 8 up (since 29h), 8 in (since 28h); 1 remapped pgs > > data: > pools: 4 pools, 321 pgs > objects: 10.40M objects, 348 GiB > usage: 1.7 TiB used, 442 GiB / 2.2 TiB avail > pgs: 39434/46694661 objects misplaced (0.084%) > 205 active+clean+snaptrim_wait > 99 active+clean > 16 active+clean+snaptrim > 1 active+clean+remapped+snaptrim_wait > > io: > client: 19 KiB/s rd, 22 MiB/s wr, 2 op/s rd, 174 op/s wr > > As part of the testing we failed an OSD to benchmark client IO under > recovery. Strangely enough, after the cluster recovered, 1 PG remains in > state remapped. Despite that, health is OK. This seems problematic, because > the PG will probably accumulate PG_LOG entries until the remapped state is > cleared. The history versions look already wildly different. Here the full > PG state: > > PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES > OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE > STATE_STAMP VERSION REPORTED UP > UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB > SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP > SNAPTRIMQ_LEN > 4.1c 39438 0 0 39438 0 > 2825704691 0 0 1933 1933 > active+clean+remapped+snaptrim_wait 2022-08-27T19:05:15.144083+0200 > 4170'3108053 4170:3022415 [6,1,4,5,3,NONE] 6 [6,1,4,5,3,1] > 6 3312'2843531 2022-08-24T22:40:42.482024+0200 > 2832'2067159 2022-08-21T02:13:17.023702+0200 49 > > Any ideas why this PG is stuck in remapped and does not rebalance objects? > Is there a way to convince it to start rebalancing? > > Thanks and Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx