Hi Mykola, Thank you for your kind advice! I created a tracker ticket. http://tracker.ceph.com/issues/24373 https://github.com/ceph/ceph/pull/22358 If I made a mistake, please let me know. Thanks, Kouya Mykola Golub <to.my.trociny@xxxxxxxxx> writes: > Hi Kouya, > > Thank you for reporting and trying to fix this! > > Could you please create a tracker ticket [1] to make sure it is not > lost? > > And I think it would be much easier to review your patches (and bring > the core developers attention) if you created a pull request [2]. And > if you do this please add the PR link to the tracker ticket. > > If you have problems with any of this just let us know, I can do it > for you. > > [1] http://tracker.ceph.com/projects/rados > [2] https://github.com/ceph/ceph > > On Mon, May 28, 2018 at 06:36:18PM +0900, Kouya Shimura wrote: >> Hi, >> >> I've found that a PG is eternally stuck in 'unfound_recovery' after >> some OSDs are marked down. >> >> For example, the following steps reproduce this. >> >> 1) Create EC 2+1 pool. Assume a PG has [1,0,2] up/acting set. >> 2) Execute "ceph osd out osd.0 osd.2". Now the PG has [1,3,5] up/acting set. >> 3) Put some objects to the PG. >> 4) Execute "ceph osd in osd.0 osd.2". It starts recovering to [1,0,2]. >> 5) Execute "ceph osd down osd.3 osd.5". (These downs are fake. osd.3 >> and osd.5 are actually not down) >> It leads the PG to transit 'unfound_recovery' and stay on forever. >> >> Interestingly, this bad situation is resolved by mean of marking down >> another OSD. >> >> 6) Executing "ceph osd down osd.0" (any OSD in acting set is ok) resolves >> 'unfound_recovery' and restart recovering. >> >> >> Upon my investigation, if downed OSD is not a member of current up/acting set, >> a PG might stay 'ReplicaActive' and discard peering requests from the primary. >> Thus the primary OSD can't exit from unfound state. >> PGs of downed OSD should transit to 'Reset' state and start peering. >> >> >> I'll post two patches. The first one fixes this issue. >> The second one is trivial optimization (optional). >> >> Thanks, >> Kouya >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html