On Thu, Mar 24, 2016 at 2:15 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > Ultimately, this is about shrinking the time it takes for a MON to > notice the "oops". Do we expect those things to be common and frequent > enough to justify an external daemon, however small and simple, on each > OSD node? Let's not forget that extra daemons aren't free quite apart from having to build them. There's a lot of user education to happen. There's more stuff to install; we'll have extra cephx keys for them that need to get placed; we need to update all our install and management tools to set them up. We'll probably run into new kinds of resource exhaustion, and we'll hit new errors around the local communication setup. :/ I'm uneasy about creating *any* mechanism that automatically marks down OSDs, but isn't directed by the OSD in question. Plus, I think there are other benefits of annotating our asserts more carefully. They're kind of a mess right now and if we were able to do more than crash on disk errors, it'd be nice when we move on to gathering statistics and things... -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html