On Thu, Apr 22, 2021 at 04:16:34PM +0300, Mykola Golub wrote: > I would like to bring some attention to a problem we have been > observing with nautilus, and which I reported here [1]. > > If a pg is in backfill_unfound state ("unfound" objects were detected > during backfill), and one of the osds from the active set is restarted > the state changes to clean, losing the information about unfound > objects. > > And when I tired to reproduce the issue on the master with the same > scenario, the status did not change, but I was observing the primary > osd crash after a non-primary restart. Ok. Now I seem to have better understanding what is going on here. As I wrote in [1], when `PrimaryLogPG::on_failed_pull` is called when the object is not found on the backfill source osd, the oid is removed from `backfills_in_flight` only if the backfill source is primary [2]. In our case we are backfilling a non-primary EC shard, so the oid is not removed from `backfills_in_flight`. And later it causes the assertion failure in `PrimaryLogPG::_clear_recovery_state`. The behavior seemed to be changed during post-nautilus refactoring, in [3]. Previously for the EC backend the oid was removed from `backfills_in_flight` unconditionally, and now it is removed only if the source is primary. In [1] I questioned this change, but after investigating how it works, now it looks quite reasonable to me. So, the current behavior is: In `PrimaryLogPG::recover_backfill`, due to the "unfound" oid is not removed from `backfills_in_flight`, `next_backfill_to_complete` is always set to the "unfound" oid [4], and `new_last_backfill` is not updated any more pointing to the object before the "unfound" oid. The backfill still continues and terminates only after all objects are pulled/pushed, but "complete" position remains on the object before "unfound". After the backfill is finished the pg enters "backfill_unfound" state. When the pg is re-peered (e.g. after restarting an osd) it enters "backfilling" state starting the backfill from "unfound" oid position, detects the "unfound" object again, scans the remaining objects detecting they are already copied, and enters "backfill_unfound" state again with the same "complete" position on the "unfound" object. This looks like a reasonable behavoir to me, and the only problem is that reported assertion failure, which probably is just needed to be removed? In Nautilus, because the "unfound" oid is removed from `backfills_in_flight`, the "complete" position is not stopped on this oid, and when the backfill is finished it also enters "backfill_unfound" state, but "complete" backfill postion is at the end now. So when the pg is re-peered, the backfill is not re-started from "unfound" position, the "unfound" object is not detected and the pg enters "clean" state. If my understanding is correct, it looks like we have to: 1) in master, fix the assertion failure, probably by just removing the assertion, and backport the fix. 2) in nautilus (direct commit), make the EC backend not remove "unfound" oid from `backfills_in_flight` to have post-nautilus behavior. Does it make sense? [1] https://tracker.ceph.com/issues/50351#note-1 [2] https://github.com/ceph/ceph/blob/813933f81e3d682a0b1ae6dd906e38e78c4859a4/src/osd/PrimaryLogPG.cc#L12453 [3] https://github.com/ceph/ceph/commit/8a8947d2a32d6390cb17099398e7f2212660c9a1 [4] https://github.com/ceph/ceph/blob/813933f81e3d682a0b1ae6dd906e38e78c4859a4/src/osd/PrimaryLogPG.cc#L14010 > > [1] https://tracker.ceph.com/issues/50351 -- Mykola Golub _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx