On Tue, 11 Jul 2017, Matt Benjamin wrote: > Hi Abhishek, > > There are plans in place to provide for enhanced scheduling and > fairness intrinsically, somewhat in tandem with new front-end > (boost::asio/beast) and librados interfacing work by Adam. I'm not > clear whether this proposal advances that goal, or not. It seems like > it adds complexity that we won't want to retain for the long term, but > maybe it's helpful in ways I don't understand yet. > > It seems like it would definitely make sense to have a focused > discussion in one of our standups of the broader issues, approaches > being taken, and so on. The (currently empty) agenda for the next CDM (Aug 2) is here: http://tracker.ceph.com/projects/ceph/wiki/CDM_02-AUG-2017?parent=Planning sage > regards, > > Matt > > On Mon, Jul 10, 2017 at 8:01 AM, Matt Benjamin <mbenjami@xxxxxxxxxx> wrote: > > Hi Abhishek, > > > > There are plans in place to provide for enhanced scheduling and fairness > > intrinsically, somewhat in tandem with new front-end (boost::asio/beast) and > > librados interfacing work by Adam. I'm not clear whether this proposal > > advances that goal, or not. It seems like it adds complexity that we won't > > want to retain for the long term, but maybe it's helpful in ways I don't > > understand yet. > > > > It seems like it would definitely make sense to have a focused discussion in > > one of our standups of the broader issues, approaches being taken, and so > > on. > > > > regards, > > > > Matt > > > > > > On Mon, Jul 10, 2017 at 7:47 AM, Abhishek Varshney > > <abhishek.varshney@xxxxxxxxxxxx> wrote: > >> > >> TL;DR > >> --------- > >> The proposal is to separate out read and write threads/handles in > >> civetweb/rgw to reduce the blast radius in case of an outage caused > >> due to one type of op (GET or PUT) being blocked or latent. Proposal > >> PR : https://github.com/ceph/civetweb/pull/21 > >> > >> Problem Statment > >> ------------------------ > >> Our production clusters, primarily running object gateway workloads on > >> hammer, have quite a few times seen one type of op (GET or PUT) being > >> blocked or latent due to different reasons. This have resulted in a > >> complete outage with rgw becoming totally un-responsive and unable to > >> accept connections. After root causing the issue, it is found that > >> there is no separation of resources, threads and handles at civetweb > >> and rgw layers, which causes a complete blackout. > >> > >> Scenarios > >> -------------- > >> Some scenarios which are known to block one kind of op (GET or PUT). > >> > >> * PUTs are blocked when pool with bucket index is degraded. We have > >> large omap objects, recovery/rebalancing of which is known to block > >> PUT ops for longer duration of times ( ~ couple of hours). We are > >> working to address this issue separately also. > >> > >> * GETs are blocked when rgw data pool (which is front-ended by a > >> writeback cache tier on a different crush root) is degraded. > >> > >> There could be other such scenarios too. > >> > >> Proposed Approach > >> --------------------------- > >> The proposal here is to separate read and write resources in terms of > >> threads in civetweb and rados handles in rgw which would help to limit > >> the blast radius and reduce the impact of any outage that may happen. > >> > >> * civetweb : currently in civetweb, there is a common pool of worker > >> threads which consume sockets from a queue to process. In case of > >> blocked requests in ceph, the queue becomes full and civetweb master > >> thread is stuck in a loop waiting for the queue to become empty [1] > >> and is unable to process any more requests. > >> > >> The proposal is to introduce 2 additional queues, a read connection > >> queue and a write connection queue along with a dispatcher thread > >> which picks sockets from the socket queue and puts them to one of > >> these queues based on the type of the op. In case, a queue is full, > >> the dispatcher thread would return a 503 instead of waiting for that > >> queue to be empty again. > >> > >> This is supposed to limit failures and thus improve the availability > >> of the clusters. > >> > >> The ideas described above are presented in the form of a PR here : > >> https://github.com/ceph/civetweb/pull/21 > >> > >> * rgw : while the proposed changes in civetweb should give major > >> returns, next level of optimisations can be done in rgw, where the > >> rados handles can be separated again based on the type of op, so that > >> civetweb worker threads dont end up contending on rados handles. > >> > >> Would love to hear suggestions, opinions and feedback from the community. > >> > >> PS : Due to lack of a proper branch which keeps track of latest branch > >> of civetweb and as per the suggestions received from the irc channel, > >> the PR is raised against wip-listen4 branch of civetweb. > >> > >> 1. https://github.com/ceph/civetweb/blob/wip-listen4/src/civetweb.c#L12558 > >> > >> Thanks > >> Abhishek Varshney > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html