Re: why degraded object must recover before accept new io

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 30 May 2018 15:57:16 -0700

On Wed, May 30, 2018 at 9:19 AM, zengran zhang <z13121369189@xxxxxxxxx> wrote:
> Hi
>     Let's say acting set is [3, 1, 0], obj1 was marked missing on
> osd.0 after peering, new io on obj1 will wait obj1 until be recovered.
> So my question is  why cant we do the new io on [3, 1] and let osd.0
> keep missing obj1 without wait on recover, osd.0 update pglog only
> like backfill does? if the size of osds with newest object is more
> than min_size,  do we need to wait recover?

This isn't impossible to do, but it's *yet another* piece of metadata
we'd need to keep track of and account for in all the other recovery
and IO paths, so nobody's done it yet. Somebody would have to design
the algorithms, write the code, and persuade us the UX/performance
improvement is worth the ongoing maintenance burden. In particular,
note that any write-before-recovery needs to make sure it still obeys
the min_size rules, and I don't think any systems are set up to enable
that tracking separate from the acting set right now.

I haven't looked closely at this code in a while so I don't know how
easy it would be to implement, or how high the bar for accepting such
a PR might be. It's not a bad thing to look at AFAIK, though! :)
-Greg

>
>     i see the new async recover feature move the osd.0 from acting to
> async_recover_target,keeping acting set bigger than min_size, and
> osd.0 being choosen is because it have more missing objects, so
> objects missing on acting set still need recover first...
>
>    best regards
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html