contraining crush placement possibilities

Sage Weil <sage@xxxxxxxxxxx> · Thu, 6 Mar 2014 12:30:40 -0800 (PST)

During the CRUSH CDS session yesterday I talked a bit about the desire to 
constrain the number of possible disk combinations so that we reduce the 
probability of a concurrent failure from causing data loss.  Sheldon just 
pointed out a talk from ATC that discusses the basic problem:

	https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon

The situation with CRUSH is slightly better, I think, because the number 
of peers for a given OSD in a large cluster is bounded (pg_num / 
num_osds), but I think we may still be able improve things.

Last night it occurred to me that this is almost just having pgp_num < 
pg_num, but I think that's not quite right either.

If anyone has some clear intuition here, would love to hear it.  If there 
is anything we can do to improve things we definitely want to do it!

sage

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html