Hi, I found an `active+recovery_unfound+undersized+degraded+remapped` pg after restarting all nodes one by one. Could anyone give some hints why this problem happened and how to restore my data? I read some documents and searched Ceph issues, but I couldn't find enough information. https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#unfound-objects https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/troubleshooting_guide/index#unfound-objects_diag All OSDs are `IN` and `UP`. In addition, no OSD are added/removed during node reboot. I know I can use `ceph pg mark_unfound_lost` as a last resort, but I hesitate to do that is because the lost PG is a part of RGW's bucket index. # additional information ## softwares - Ceph: v16.2.4 - Rook: v1.6.3 ## the result of some commands ### ceph -s ```command ceph -s cluster: id: b160a475-c579-46a2-9346-416d3a229c5f health: HEALTH_ERR 8/47565926 objects unfound (0.000%) Possible data damage: 1 pg recovery_unfound Degraded data redundancy: 142103/142697778 objects degraded (0.100%), 1 pg degraded, 1 pg undersized 1 daemons have recently crashed services: mon: 3 daemons, quorum eb,ef,eg (age 9h) mgr: a(active, since 9h), standbys: b osd: 18 osds: 18 up (since 13h), 18 in (since 13h); 1 remapped pgs rgw: 3 daemons active (3 hosts, 1 zones) data: pools: 12 pools, 185 pgs objects: 47.57M objects, 3.1 TiB usage: 20 TiB used, 111 TiB / 131 TiB avail pgs: 142103/142697778 objects degraded (0.100%) 8/47565926 objects unfound (0.000%) 181 active+clean 2 active+clean+scrubbing+deep+repair 1 active+clean+scrubbing 1 active+recovery_unfound+undersized+degraded+remapped io: client: 114 KiB/s rd, 5.7 MiB/s wr, 22 op/s rd, 279 op/s wr ``` ### ceph health detail ``` ceph health detail HEALTH_ERR 8/47565953 objects unfound (0.000%); Possible data damage: 1 pg recovery_unfound; Degraded data redundancy: 142103/142697859 objects degraded (0.100%), 1 pg degraded, 1 pg undersized; 1 daemons have recently crashed [WRN] OBJECT_UNFOUND: 8/47565953 objects unfound (0.000%) pg 10.1 has 8 unfound objects [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound pg 10.1 is active+recovery_unfound+undersized+degraded+remapped, acting [3,13], 8 unfound [WRN] PG_DEGRADED: Degraded data redundancy: 142103/142697859 objects degraded (0.100%), 1 pg degraded, 1 pg undersized pg 10.1 is stuck undersized for 14h, current state active+recovery_unfound+undersized+degraded+remapped, last acting [3,13] [WRN] RECENT_CRASH: 1 daemons have recently crashed client.rgw.ceph.hdd.object.store.a crashed on host rook-ceph-rgw-ceph-hdd-object-store-a-6d5d75c87c-z9rxz at 2021-08-16T17:24:05.471655Z ``` ### ceph pg query ``` ceph pg 10.1 query ... "up": [ 11, 13, 3 ], "acting": [ 3, 13 ], "backfill_targets": [ "11" ], "acting_recovery_backfill": [ "3", "11", "13" ], ... "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2021-08-16T11:24:12.402345+0000", "might_have_unfound": [ { "osd": "2", "status": "already probed" }, { "osd": "7", "status": "already probed" }, { "osd": "11", "status": "already probed" }, { "osd": "12", "status": "already probed" }, { "osd": "13", "status": "already probed" }, { "osd": "15", "status": "already probed" } ], ``` ### ceph pg ls ``` ceph pg ls | grep -v 'active+clean' PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP 10.1 142087 142103 0 8 0 87077343246 300322300 9716 active+recovery_unfound+undersized+degraded+remapped 14h 46437'28097849 46437:121092131 [11,13,3]p11 [3,13]p3 2021-08-15T00:10:21.615494+0000 2021-08-10T16:42:39.172001+0000 ``` ### ceph osd tree ``` ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 130.99301 root default -4 43.66434 zone rack0 -3 7.27739 host 10-69-0-22 0 hdd 7.27739 osd.0 up 1.00000 1.00000 -19 7.27739 host 10-69-0-23 6 hdd 7.27739 osd.6 up 1.00000 1.00000 -7 7.27739 host 10-69-0-24 1 hdd 7.27739 osd.1 up 1.00000 1.00000 -27 7.27739 host 10-69-0-28 13 hdd 7.27739 osd.13 up 1.00000 1.00000 -9 7.27739 host 10-69-0-29 2 hdd 7.27739 osd.2 up 1.00000 1.00000 -15 7.27739 host 10-69-0-30 4 hdd 7.27739 osd.4 up 1.00000 1.00000 -24 43.66434 zone rack1 -41 0 host 10-69-0-214 -33 7.27739 host 10-69-0-215 12 hdd 7.27739 osd.12 up 1.00000 1.00000 -35 0 host 10-69-0-217 -45 7.27739 host 10-69-0-218 9 hdd 7.27739 osd.9 up 1.00000 1.00000 -29 14.55478 host 10-69-0-220 10 hdd 7.27739 osd.10 up 1.00000 1.00000 17 hdd 7.27739 osd.17 up 1.00000 1.00000 -23 7.27739 host 10-69-0-221 8 hdd 7.27739 osd.8 up 1.00000 1.00000 -31 7.27739 host 10-69-0-222 11 hdd 7.27739 osd.11 up 1.00000 1.00000 -12 43.66434 zone rack2 -21 7.27739 host 10-69-1-151 ``` Thanks, Satoru _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx