On Fri, Mar 25, 2016 at 1:54 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Thu, 24 Mar 2016, Gregory Farnum wrote: >> On Thu, Mar 24, 2016 at 2:15 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: >> > >> > Ultimately, this is about shrinking the time it takes for a MON to >> > notice the "oops". Do we expect those things to be common and frequent >> > enough to justify an external daemon, however small and simple, on each >> > OSD node? >> >> Let's not forget that extra daemons aren't free quite apart from >> having to build them. There's a lot of user education to happen. >> There's more stuff to install; we'll have extra cephx keys for them >> that need to get placed; we need to update all our install and >> management tools to set them up. We'll probably run into new kinds of >> resource exhaustion, and we'll hit new errors around the local >> communication setup. :/ I'm uneasy about creating *any* mechanism that >> automatically marks down OSDs, but isn't directed by the OSD in >> question. >> >> Plus, I think there are other benefits of annotating our asserts more >> carefully. They're kind of a mess right now and if we were able to do >> more than crash on disk errors, it'd be nice when we move on to >> gathering statistics and things... > > Yep, I'm sold! :) > > Going back to Igor's PR... > > https://github.com/ceph/ceph/pull/7740 > > I think perhaps the first thing to do is to make a function like > Ilya suggested that is > > ceph_abort_markmedown() > > and then sort out where/when to call it (instead of tackling signal > handlers immediately). It seems like the semantics need to be something > like > > - queue the markdown message for the mon > - wait for N seconds (where N=5 or so?) > - ceph_abort() Is it to wait for the message to go out? If so, maybe request a MarkMeDown ack and have an N second Cond timeout? Modidying OSD::dispatch() or wiring it up through the service abstraction shouldn't be hard - an ack would take a lot less than a second. > > There are maybe three call sites that come to mind that will probably > catch most issues: > > - the do_transaction (or equivalent) error code checks on write > - a new helper that wraps up the checks/asserts about getting EIO on read > - the internal heartbeat that goes off when a thread pool gets stuck > > What else? > > We could also go for an OSD signal handler, but it would have to be a > best-effort sort of thing (obviuosly won't work if the messenger is > busted), and it worries me a bit: what happens if there is a segv in the > memory allocator, we try to stay alive longer so that we can send > MarkMeDown, and as a result continue processing some IO but in the > meantime let something corrupt reach disk or clients or otherwise get > worse and propogate? IMHO it's entirely unnecessary. An "oops" assert should just abort() - we are not the kernel, after all. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html