The Stray set send_notify false only if go to activate? 2017-11-02 10:27 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>: > On Wed, Nov 1, 2017 at 5:27 PM Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote: >> >> 2017-11-02 4:26 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>: >> > On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote: >> >> >> >> hi, all: >> >> >> >> I confuse about the notify message during peering. Such as: >> >> >> >> epoch 1, primary osd do Pering , GetInfo and GetMissing, it >> >> calling the func proc_replica_log. in this func the last_complete and >> >> last_update maybe reset. >> >> >> >> Before go to Activate. the OSDMap change (the new osdmap do not >> >> lead to restart peering), the non-primary osd send the notify to >> >> primary. >> > >> > >> > I don't think this can happen. The OSD won't re-send a notify during >> > the same peering interval, and even if it did the message would be >> > tagged with a new (higher) epoch so the PG wouldn't process it until >> > after it had switched states, right? >> > >> >> I just want to understand this algorithm. When the Stray osd received ActMap >> >> it would send_notity even if during the same peering interval. see >> Stray::react(const ActMap&). > > Note the > > if (pg->should_send_notify() > > check preceding that block. It checks a boolean send_notify value that > is set true only when it enters a new peering interval, and is set > false as soon as it shares its info. So I don't think the primary's > behavior matters at all (other than from a security perspective, > anyway). > > >> You say the priamry osd wouldn't process the notify msg, I do not >> find out the code. The primary >> >> call handle_pg_notify and process it. > > I didn't actually track the order of the state machine here; I just > saw that PG::RecoveryState::Active::react(const MNotifyRec& notevt) > will throw them out if it's already seen the info. You're right > PG::RecoveryState::Primary::react(const MNotifyRec& notevt) will > process it unconditionally. I'm not sure if those are the replica and > primary states, or if you move from Primary to Active (or vice versa). > -Greg > >> >> >> >> >> >> >> >> When the primary receive the notify, Primary::react(const >> >> MNotifyRec& notevt), so it call the func proc_replica_info. >> >> >> >> In the func, we update the pg info including last_complete and >> >> last_update which modified in proc_replica_log. >> > >> > Note also that "PG::RecoveryState::Active::react(const MNotifyRec& >> > notevt)" does *not* unconditionally invoke proc_replica_info(). I >> > think you were trying to say we hadn't reached this state on receipt >> > of the message? But as I mentioned above, I think we block so that's >> > not actually possible either. >> > >> >> >> >> When the primary call the func activate, the primary osd process >> >> recovering based on pg info got by notify instead of proc_replica_log. >> >> >> >> so it is a bug? >> > >> > Have you seen issues in the wild, or just trying to understand this >> > code/algorithm? I would be surprised if we had undiscovered issues >> > here just because our tests exercise peering quite vigorously, but I >> > might be missing what's happening in my own code skims. >> > -Greg >> >> >> >> -- >> Regards, >> Xinze Chi -- Regards, Xinze Chi -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html