On Tue, Aug 15, 2017 at 5:42 PM, sheng qiu <herbert1984106@xxxxxxxxx> wrote: > Hi, > > recently, we got an assert in function can_discard_replica_op() when > osd is handling replica op reply. The assert is caused by > get_down_at() which checks if the source osd is still exists(), > otherwise it assert. > > seems in our testing environment, the source osd send an op reply to > primary osd and then died. > > My question should we first check exists() and avoid the assert happen > in get_down_at() or it's expected to be always exists() at this > situation. An OSD existing is just making sure it is in the OSDMap at all (it doesn't need to be up or in). If you've managed to get an OSD sending ops during an epoch where it doesn't exist, something has gone terribly wrong — the local assert is not the problem! We can follow-up on the assert at the ticket you made (http://tracker.ceph.com/issues/21006). -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html