Re: backfill_unfound state reset to clean after osd restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 01, 2021 at 08:29:48PM +0300, Mykola Golub wrote:
> On Thu, Apr 22, 2021 at 04:16:34PM +0300, Mykola Golub wrote:
> 
> > I would like to bring some attention to a problem we have been
> > observing with nautilus, and which I reported here [1].
> > 
> > If a pg is in backfill_unfound state ("unfound" objects were detected
> > during backfill), and one of the osds from the active set is restarted
> > the state changes to clean, losing the information about unfound
> > objects.
> > 
> > And when I tired to reproduce the issue on the master with the same
> > scenario, the status did not change, but I was observing the primary
> > osd crash after a non-primary restart.
> 
> Ok. Now I seem to have better understanding what is going on here.
> 
> As I wrote in [1], when `PrimaryLogPG::on_failed_pull` is called when
> the object is not found on the backfill source osd, the oid is removed
> from `backfills_in_flight` only if the backfill source is primary [2].
> In our case we are backfilling a non-primary EC shard, so the oid is
> not removed from `backfills_in_flight`. And later it causes the
> assertion failure in `PrimaryLogPG::_clear_recovery_state`.
> 
> The behavior seemed to be changed during post-nautilus refactoring, in
> [3]. Previously for the EC backend the oid was removed from
> `backfills_in_flight` unconditionally, and now it is removed only if
> the source is primary.
> 
> In [1] I questioned this change, but after investigating how it works,
> now it looks quite reasonable to me.
> 
> So, the current behavior is: In `PrimaryLogPG::recover_backfill`, due
> to the "unfound" oid is not removed from `backfills_in_flight`,
> `next_backfill_to_complete` is always set to the "unfound" oid [4],
> and `new_last_backfill` is not updated any more pointing to the object
> before the "unfound" oid. The backfill still continues and terminates
> only after all objects are pulled/pushed, but "complete" position
> remains on the object before "unfound". After the backfill is finished
> the pg enters "backfill_unfound" state. When the pg is re-peered
> (e.g. after restarting an osd) it enters "backfilling" state starting
> the backfill from "unfound" oid position, detects the "unfound" object
> again, scans the remaining objects detecting they are already copied,
> and enters "backfill_unfound" state again with the same "complete"
> position on the "unfound" object.
> 
> This looks like a reasonable behavoir to me, and the only problem is
> that reported assertion failure, which probably is just needed to be
> removed?
> 
> In Nautilus, because the "unfound" oid is removed from
> `backfills_in_flight`, the "complete" position is not stopped on this
> oid, and when the backfill is finished it also enters
> "backfill_unfound" state, but "complete" backfill postion is at the
> end now. So when the pg is re-peered, the backfill is not re-started
> from "unfound" position, the "unfound" object is not detected and the
> pg enters "clean" state.
> 
> If my understanding is correct, it looks like we have to:
> 
> 1) in master, fix the assertion failure, probably by just removing the
>    assertion, and backport the fix.

https://github.com/ceph/ceph/pull/41270

> 
> 2) in nautilus (direct commit), make the EC backend not remove
>    "unfound" oid from `backfills_in_flight` to have post-nautilus
>    behavior.

https://github.com/ceph/ceph/pull/41293

> 
> Does it make sense?
> 
> [1] https://tracker.ceph.com/issues/50351#note-1
> [2] https://github.com/ceph/ceph/blob/813933f81e3d682a0b1ae6dd906e38e78c4859a4/src/osd/PrimaryLogPG.cc#L12453
> [3] https://github.com/ceph/ceph/commit/8a8947d2a32d6390cb17099398e7f2212660c9a1
> [4] https://github.com/ceph/ceph/blob/813933f81e3d682a0b1ae6dd906e38e78c4859a4/src/osd/PrimaryLogPG.cc#L14010
> 
> > 
> > [1] https://tracker.ceph.com/issues/50351
> 
> -- 
> Mykola Golub

-- 
Mykola Golub
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux