Re: How to reduce the influenct on the IO when an osd is marked out?

Sage Weil <sage@xxxxxxxxxxxx> · Sat, 10 Oct 2015 06:39:12 -0700 (PDT)

[dropping ceph-users]

On Sat, 10 Oct 2015, wangsongbo wrote:
> Hi all,
> when an osd is marked out, relative IO will be blocked, in which case,
> application built on ceph will fail.According to test result, the larger a
> data is,the longer it will take to elapse.
> How to reduce the impact of this process on the IO?

When you mark an osd out the mon is doing prime_pg_temp, which 
preemptively remaps the PG to the same OSDs.  This should make peering 
fast... except that the OSDs still have to do a cycle of up_thru updates.

Sam, I think we can do the following to avoid this:

 - in build prior, we can infer that that interval for last_epoch_start is 
also a rw interval (because clearly it finished peering).

 - if the acting set and primary do not change, we can skip the up_thru 
update (because we will already infer rw from above).

I think the only caveat is that we can only skip the up_thru update 
once the entire cluster has a feature bit indicating they understand 
that last_epoch_started implies rw.

Anyway, this would mean that there's no subsequent mon interaction after 
the mark out (and probably lots of other common scenarios)...

What do you think?
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html