> The complicating case here is the OSD status set. Running this > through a single Paxos limits the number of OSDs that can coexist in a > cluster. We ought split the set of OSDs between multiple masters to Does multiple masters you mentioned here mean logical OSDs running in one osd process described in # Interfaces #? > distribute the load. Each 'Up' or 'Down' event is independent of > others, so all we require is that events get propagated into the > correct OSDs and primaries and followers act as they're supposed to. > > Versioning is a bigger problem here. We might have all masters > increment their version when one increments its version if that could > be managed without inefficiency. We might send a compound version with > `MOSDOp`s, but combining that with the compound version above might be > unwieldly. (Feedback on this issue would be greatly appreciated.) Cheers, S ----- Original Message ----- > From: "Adam C. Emerson" <aemerson@xxxxxxxxxx> > To: "The Sacred Order of the Squid Cybernetic" <ceph-devel@xxxxxxxxxxxxxxx> > Sent: Friday, April 15, 2016 5:05:37 PM > Subject: Ann Arbor Team's Flexible I/O Proposals (Ceph Next) > > Ceph Developers, > > We've put together a few of the main ideas from our previous work in a > brief form that we hope people will be able to digest, consider, and > debate. We'd also like to discuss them with you at Ceph Next this > Tuesday. > > > ## The OSD Set ## > > The complicating case here is the OSD status set. Running this > through a single Paxos limits the number of OSDs that can coexist in a > cluster. We ought split the set of OSDs between multiple masters to > distribute the load. Each 'Up' or 'Down' event is independent of > others, so all we require is that events get propagated into the > correct OSDs and primaries and followers act as they're supposed to. > > Versioning is a bigger problem here. We might have all masters > increment their version when one increments its version if that could > be managed without inefficiency. We might send a compound version with > `MOSDOp`s, but combining that with the compound version above might be > unwieldly. (Feedback on this issue would be greatly appreciated.) When Tom Keiser and I considered the problem of distributing AFS3 data for a single vnode across multiple data servers, iirc, we both arrived at the notion of compound DataVersion (or "range dv") as the extension of DataVersion to the partitioned object. It feels like a similar structure naturally arises here, I admit I have not thought about this problem in a while. Regards, Matt -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-707-0660 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- Email: shinobu@xxxxxxxxx GitHub: shinobu-x Blog: Life with Distributed Computational System based on OpenSource -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html