Re: backfill_unfound state reset to clean after osd restart

Mykola Golub <to.my.trociny@xxxxxxxxx> · Sat, 1 May 2021 20:29:48 +0300

On Thu, Apr 22, 2021 at 04:16:34PM +0300, Mykola Golub wrote:

> I would like to bring some attention to a problem we have been
> observing with nautilus, and which I reported here [1].
> 
> If a pg is in backfill_unfound state ("unfound" objects were detected
> during backfill), and one of the osds from the active set is restarted
> the state changes to clean, losing the information about unfound
> objects.
> 
> And when I tired to reproduce the issue on the master with the same
> scenario, the status did not change, but I was observing the primary
> osd crash after a non-primary restart.

Ok. Now I seem to have better understanding what is going on here.

As I wrote in [1], when `PrimaryLogPG::on_failed_pull` is called when
the object is not found on the backfill source osd, the oid is removed
from `backfills_in_flight` only if the backfill source is primary [2].
In our case we are backfilling a non-primary EC shard, so the oid is
not removed from `backfills_in_flight`. And later it causes the
assertion failure in `PrimaryLogPG::_clear_recovery_state`.

The behavior seemed to be changed during post-nautilus refactoring, in
[3]. Previously for the EC backend the oid was removed from
`backfills_in_flight` unconditionally, and now it is removed only if
the source is primary.

In [1] I questioned this change, but after investigating how it works,
now it looks quite reasonable to me.

So, the current behavior is: In `PrimaryLogPG::recover_backfill`, due
to the "unfound" oid is not removed from `backfills_in_flight`,
`next_backfill_to_complete` is always set to the "unfound" oid [4],
and `new_last_backfill` is not updated any more pointing to the object
before the "unfound" oid. The backfill still continues and terminates
only after all objects are pulled/pushed, but "complete" position
remains on the object before "unfound". After the backfill is finished
the pg enters "backfill_unfound" state. When the pg is re-peered
(e.g. after restarting an osd) it enters "backfilling" state starting
the backfill from "unfound" oid position, detects the "unfound" object
again, scans the remaining objects detecting they are already copied,
and enters "backfill_unfound" state again with the same "complete"
position on the "unfound" object.

This looks like a reasonable behavoir to me, and the only problem is
that reported assertion failure, which probably is just needed to be
removed?

In Nautilus, because the "unfound" oid is removed from
`backfills_in_flight`, the "complete" position is not stopped on this
oid, and when the backfill is finished it also enters
"backfill_unfound" state, but "complete" backfill postion is at the
end now. So when the pg is re-peered, the backfill is not re-started
from "unfound" position, the "unfound" object is not detected and the
pg enters "clean" state.

If my understanding is correct, it looks like we have to:

1) in master, fix the assertion failure, probably by just removing the
   assertion, and backport the fix.

2) in nautilus (direct commit), make the EC backend not remove
   "unfound" oid from `backfills_in_flight` to have post-nautilus
   behavior.

Does it make sense?

[1] https://tracker.ceph.com/issues/50351#note-1
[2] https://github.com/ceph/ceph/blob/813933f81e3d682a0b1ae6dd906e38e78c4859a4/src/osd/PrimaryLogPG.cc#L12453
[3] https://github.com/ceph/ceph/commit/8a8947d2a32d6390cb17099398e7f2212660c9a1
[4] https://github.com/ceph/ceph/blob/813933f81e3d682a0b1ae6dd906e38e78c4859a4/src/osd/PrimaryLogPG.cc#L14010

> 
> [1] https://tracker.ceph.com/issues/50351

-- 
Mykola Golub
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx