Re: Degraded PG does not discover remapped data on originating OSD

Jonas Jelten <jelten@xxxxxxxxx> · Thu, 13 Dec 2018 14:14:32 +0100



On 13/12/2018 13.38, Jonas Jelten wrote:
> 
> I now tested this on a 5-node 20-osd 3-replica-only cluster.
> Easy steps to reproduce seem to be:
> 
> * Have a healthy cluster
> * ceph osd set pause                                # make sure no writes mess up the test
> * ceph osd set nobackfill
> * ceph osd set norecover                            # make sure the error is not recovered but instead stays
> * ceph tell 'osd.*' injectargs '--debug_osd=20/20'  # turn up logging
> * ceph osd out $osdid # take out a random osd
> * observe the state, now objects are degraded already, check pg query.
>   In my test, I observe that $osdid was "already probed" but it does have the data,
>   the cluster was completely healthy before.
> * ceph osd down $osdid                              # repeer this osd, it'll come up again right away
> * observe the state again, even more objects are degraded now, check pg query.
>   In my test, $osdid is now "not queried"
> * ceph osd in $osdid                                # everything turns back to normal and healthy
> * ceph tell 'osd.*' injectargs '--debug_osd=1/5'    # silence logging again
> * ceph osd unset ...                                # unset the flags
> 
> 
> In summary: while preventing recovery, an out osd produces degraded objects. An out and repeered OSD produces even more
> degraded objects. Taking it in again will discover all missing object copies.
> 


I've posted the level-20 log of an OSD to https://tracker.ceph.com/issues/37439