On Thu, 24 Mar 2016, Gregory Farnum wrote: > On Thu, Mar 24, 2016 at 2:15 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > Ultimately, this is about shrinking the time it takes for a MON to > > notice the "oops". Do we expect those things to be common and frequent > > enough to justify an external daemon, however small and simple, on each > > OSD node? > > Let's not forget that extra daemons aren't free quite apart from > having to build them. There's a lot of user education to happen. > There's more stuff to install; we'll have extra cephx keys for them > that need to get placed; we need to update all our install and > management tools to set them up. We'll probably run into new kinds of > resource exhaustion, and we'll hit new errors around the local > communication setup. :/ I'm uneasy about creating *any* mechanism that > automatically marks down OSDs, but isn't directed by the OSD in > question. > > Plus, I think there are other benefits of annotating our asserts more > carefully. They're kind of a mess right now and if we were able to do > more than crash on disk errors, it'd be nice when we move on to > gathering statistics and things... Yep, I'm sold! :) Going back to Igor's PR... https://github.com/ceph/ceph/pull/7740 I think perhaps the first thing to do is to make a function like Ilya suggested that is ceph_abort_markmedown() and then sort out where/when to call it (instead of tackling signal handlers immediately). It seems like the semantics need to be something like - queue the markdown message for the mon - wait for N seconds (where N=5 or so?) - ceph_abort() There are maybe three call sites that come to mind that will probably catch most issues: - the do_transaction (or equivalent) error code checks on write - a new helper that wraps up the checks/asserts about getting EIO on read - the internal heartbeat that goes off when a thread pool gets stuck What else? We could also go for an OSD signal handler, but it would have to be a best-effort sort of thing (obviuosly won't work if the messenger is busted), and it worries me a bit: what happens if there is a segv in the memory allocator, we try to stay alive longer so that we can send MarkMeDown, and as a result continue processing some IO but in the meantime let something corrupt reach disk or clients or otherwise get worse and propogate? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html