Hi Paul, Many thanks for your helpful suggestions. Yes, we have 13 pgs with "might_have_unfound" entries. (also 1 pgs without "might_have_unfound" stuck in active+recovery_unfound+degraded+repair state) Taking one pg with unfound objects: [root@ceph1 ~]# ceph health detail | grep 5.5c9 pg 5.5c9 has 2 unfound objects pg 5.5c9 is active+recovery_unfound+degraded, acting [347,442,381,215,91,260,31,94,178,302], 2 unfound pg 5.5c9 is active+recovery_unfound+degraded, acting [347,442,381,215,91,260,31,94,178,302], 2 unfound pg 5.5c9 not deep-scrubbed since 2020-01-16 08:05:43.119336 pg 5.5c9 not scrubbed since 2020-01-16 08:05:43.119336 Checking the state: [root@ceph1 ~]# ceph pg 5.5c9 query | jq .recovery_state [ { "name": "Started/Primary/Active", "enter_time": "2020-02-03 09:57:30.982038", "might_have_unfound": [ { "osd": "31(6)", "status": "already probed" }, { "osd": "91(4)", "status": "already probed" }, { "osd": "94(7)", "status": "already probed" }, { "osd": "178(8)", "status": "already probed" }, { "osd": "215(3)", "status": "already probed" }, { "osd": "260(5)", "status": "already probed" }, { "osd": "302(9)", "status": "already probed" }, { "osd": "381(2)", "status": "already probed" }, { "osd": "442(1)", "status": "already probed" } ], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "recovery_ops": [], "read_ops": [] } }, "scrub": { "scrubber.epoch_start": "0", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2020-02-03 09:57:29.788310" } ] ----------------------------------------------------- Taking your advice, I restart the primary osd for this pg: [root@ceph1 ~]# ceph osd down 347 This doesn't change the output of "ceph pg 5.5c9 query", apart from updating the Started time, and ceph health still shows unfound objects. To fix this, do we need to issue a scrub (or deep scrub) so that the objects can be found? Just in case, I've issued a manual scrub: [root@ceph1 ~]# ceph pg scrub 5.5c9 instructing pg 5.5c9s0 on osd.347 to scrub The cluster is currently busy deleting snapshots, so it may take a while before the scrub starts. best regards, Jake On 2/3/20 6:31 PM, Paul Emmerich wrote: > This might be related to recent problems with OSDs not being queried > for unfound objects properly in some cases (which I think was fixed in > master?) > > Anyways: run ceph pg <pg> query on the affected PGs, check for "might > have unfound" and try restarting the OSDs mentioned there. Probably > also sufficient to just run "ceph osd down" on the primaries on the > affected PGs to get them to re-check. > > > Paul > -- Jake Grimmett MRC Laboratory of Molecular Biology Francis Crick Avenue, Cambridge CB2 0QH, UK. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx