proposal : read/write threads and handles separation in civetweb and rgw

Abhishek Varshney <abhishek.varshney@xxxxxxxxxxxx> · Mon, 10 Jul 2017 17:17:26 +0530

TL;DR
---------
The proposal is to separate out read and write threads/handles in
civetweb/rgw to reduce the blast radius in case of an outage caused
due to one type of op (GET or PUT) being blocked or latent. Proposal
PR : https://github.com/ceph/civetweb/pull/21

Problem Statment
------------------------
Our production clusters, primarily running object gateway workloads on
hammer, have quite a few times seen one type of op (GET or PUT) being
blocked or latent due to different reasons. This have resulted in a
complete outage with rgw becoming totally un-responsive and unable to
accept connections. After root causing the issue, it is found that
there is no separation of resources, threads and handles at civetweb
and rgw layers, which causes a complete blackout.

Scenarios
--------------
Some scenarios which are known to block one kind of op (GET or PUT).

* PUTs are blocked when pool with bucket index is degraded. We have
large omap objects, recovery/rebalancing of which is known to block
PUT ops for longer duration of times ( ~ couple of hours). We are
working to address this issue separately also.

* GETs are blocked when rgw data pool (which is front-ended by a
writeback cache tier on a different crush root) is degraded.

There could be other such scenarios too.

Proposed Approach
---------------------------
The proposal here is to separate read and write resources in terms of
threads in civetweb and rados handles in rgw which would help to limit
the blast radius and reduce the impact of any outage that may happen.

* civetweb : currently in civetweb, there is a common pool of worker
threads which consume sockets from a queue to process. In case of
blocked requests in ceph, the queue becomes full and civetweb master
thread is stuck in a loop waiting for the queue to become empty [1]
and is unable to process any more requests.

The proposal is to introduce 2 additional queues, a read connection
queue and a write connection queue along with a dispatcher thread
which picks sockets from the socket queue and puts them to one of
these queues based on the type of the op. In case, a queue is full,
the dispatcher thread would return a 503 instead of waiting for that
queue to be empty again.

This is supposed to limit failures and thus improve the availability
of the clusters.

The ideas described above are presented in the form of a PR here :
https://github.com/ceph/civetweb/pull/21

* rgw : while the proposed changes in civetweb should give major
returns, next level of optimisations can be done in rgw, where the
rados handles can be separated again based on the type of op, so that
civetweb worker threads dont end up contending on rados handles.

Would love to hear suggestions, opinions and feedback from the community.

PS : Due to lack of a proper branch which keeps track of latest branch
of civetweb and as per the suggestions received from the irc channel,
the PR is raised against wip-listen4 branch of civetweb.

1. https://github.com/ceph/civetweb/blob/wip-listen4/src/civetweb.c#L12558

Thanks
Abhishek Varshney
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html