Hi all, our test cluster (octopus 15.2.16) ended up in a weird state: cluster: id: bf1f51f5-b381-4cf7-b3db-88d044c1960c health: HEALTH_OK services: mon: 3 daemons, quorum tceph-01,tceph-03,tceph-02 (age 4w) mgr: tceph-01(active, since 4w), standbys: tceph-02, tceph-03 mds: fs:1 {0=tceph-02=up:active} 2 up:standby osd: 9 osds: 8 up (since 29h), 8 in (since 28h); 1 remapped pgs data: pools: 4 pools, 321 pgs objects: 10.40M objects, 348 GiB usage: 1.7 TiB used, 442 GiB / 2.2 TiB avail pgs: 39434/46694661 objects misplaced (0.084%) 205 active+clean+snaptrim_wait 99 active+clean 16 active+clean+snaptrim 1 active+clean+remapped+snaptrim_wait io: client: 19 KiB/s rd, 22 MiB/s wr, 2 op/s rd, 174 op/s wr As part of the testing we failed an OSD to benchmark client IO under recovery. Strangely enough, after the cluster recovered, 1 PG remains in state remapped. Despite that, health is OK. This seems problematic, because the PG will probably accumulate PG_LOG entries until the remapped state is cleared. The history versions look already wildly different. Here the full PG state: PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN 4.1c 39438 0 0 39438 0 2825704691 0 0 1933 1933 active+clean+remapped+snaptrim_wait 2022-08-27T19:05:15.144083+0200 4170'3108053 4170:3022415 [6,1,4,5,3,NONE] 6 [6,1,4,5,3,1] 6 3312'2843531 2022-08-24T22:40:42.482024+0200 2832'2067159 2022-08-21T02:13:17.023702+0200 49 Any ideas why this PG is stuck in remapped and does not rebalance objects? Is there a way to convince it to start rebalancing? Thanks and Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx