On Fri, Oct 27, 2017 at 12:46 AM Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote: > > hi, all: > > I confuse about the notify message during peering. Such as: > > epoch 1, primary osd do Pering , GetInfo and GetMissing, it > calling the func proc_replica_log. in this func the last_complete and > last_update maybe reset. > > Before go to Activate. the OSDMap change (the new osdmap do not > lead to restart peering), the non-primary osd send the notify to > primary. I don't think this can happen. The OSD won't re-send a notify during the same peering interval, and even if it did the message would be tagged with a new (higher) epoch so the PG wouldn't process it until after it had switched states, right? > > > When the primary receive the notify, Primary::react(const > MNotifyRec& notevt), so it call the func proc_replica_info. > > In the func, we update the pg info including last_complete and > last_update which modified in proc_replica_log. Note also that "PG::RecoveryState::Active::react(const MNotifyRec& notevt)" does *not* unconditionally invoke proc_replica_info(). I think you were trying to say we hadn't reached this state on receipt of the message? But as I mentioned above, I think we block so that's not actually possible either. > > When the primary call the func activate, the primary osd process > recovering based on pg info got by notify instead of proc_replica_log. > > so it is a bug? Have you seen issues in the wild, or just trying to understand this code/algorithm? I would be surprised if we had undiscovered issues here just because our tests exercise peering quite vigorously, but I might be missing what's happening in my own code skims. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html