Re: contraining crush placement possibilities

Li Wang <liwang@xxxxxxxxxxxxxxx> · Fri, 07 Mar 2014 11:53:51 +0800

Provided 3 osds are down simultaneously

On 2014/3/7 11:51, Li Wang wrote:
Just had a quick look. It seems crush could meet the demand,
say, if we have 100 osds, replica_num is 3, then we partition the
100 osds into 3 trees, 'take' iterates on the 3 trees, for each tree,
select 1 osd. Then the probability of losing data is at most n*n*n/Cn3,
can we make it better?

On 2014/3/7 4:30, Sage Weil wrote:
During the CRUSH CDS session yesterday I talked a bit about the desire to
constrain the number of possible disk combinations so that we reduce the
probability of a concurrent failure from causing data loss.  Sheldon just
pointed out a talk from ATC that discusses the basic problem:

    https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon

The situation with CRUSH is slightly better, I think, because the number
of peers for a given OSD in a large cluster is bounded (pg_num /
num_osds), but I think we may still be able improve things.

Last night it occurred to me that this is almost just having pgp_num <
pg_num, but I think that's not quite right either.

If anyone has some clear intuition here, would love to hear it.  If there
is anything we can do to improve things we definitely want to do it!

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html