Re: backfill_unfound state reset to clean after osd restart

Mykola Golub <to.my.trociny@xxxxxxxxx> · Wed, 19 May 2021 13:21:55 +0300

On Wed, May 19, 2021 at 09:39:08AM -0000, Jin Hase wrote:
> > Suppose we have a 2+1 EC pool, and an object is missing 2 shards on
> > both non-primary osds. We initiate backfill by setting a non-primary
> > osd out. During the backfill the primary osd detects the missing
> > shards and the pg enters "backfill_unfound" state, the last_backfill
> > position is properly set to the object before the "unfound" (in
> > post-nautilus, for nautilus I opened [1] to make it work). If
> > re-peering occurs due to a non-primary osd is restarted, the backfill
> > is restarted from the last_backfill position and the "unfound" object
> > is detected again. But if re-peering occurs due the primary osd is
> > temporarily stopped (restarted), another non-primary osd becomes
> > primary and "drives" the backfill from the last_backfill position, and
> > as the shard is missing here it is just skipped from the backfill, the
> > missing object is not detected and the pg enters clean state.
> > 
> > Is there something that can/should be improved here? It is rather
> > unfortunate that the information about missing object is lost on the
> > restart (until scrub or next backfill). On the other hand the
> > situation when we have many shards are missing for an object is rather
> > unlikely. Also, if for example it happened that the shard was missing
> > on the primary it would not even be detected on backfill.
> > 
> > [1] https://github.com/ceph/ceph/pull/41293
> 
> In the case of primary osd, is there a case where the user wants to reset the state (from unfound state)?
> If we fix this behavior, is there another problem because we can't reset the state?

I am not sure I quite understand your question. Anyway trying to
answer.

It is not that someone "wants" to reset the "backfill_unfound" state.
The pg state machine can enter "backfill_unfound" state onfly after
"backfilling" state, if "missing" objects are detected during
backfill.

Now, when the pg is peering (e.g. after one of osds is stopped) and is
"finding out" its state, on activating step it calls
`needs_backfill()` function [1] to check if it needs to enter the
backfilling state.  And if it needs (when last_backfill position is
not MAX for one of backfill targets) it starts backfilling, detects a
"missing" object (if there is one) and enters "backfill_unfound"
state. And it does it on every pg peering. But if the pg peering is
due to the primary osd change, the new primary osd may not detect the
"missing" object, if its shard happens to be missing on this osd and
the backfill completes with clean state.

This how it works. The current behavior seems "by design" to me, I
don't know if it could (wanted to) be improved. That was actually my
question to the community.

Hope it helps.

[1] https://github.com/ceph/ceph/blob/5d8d691da2b96981a6d5d11e4c4142a2e08c930b/src/osd/PeeringState.cc#L1368

-- 
Mykola Golub
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx