rgw: throttling logic in the PutObjProcessor stack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear list,

I'm planning to do some refactoring in the RGWPutObjProcessor stack as part of the async request processing project, which involves replacing any blocking waits on AioCompletion::wait_for_safe() with ones that suspend/resume the coroutine from the beast frontend.

Most of this blocking happens in throttle_data() down in RGWPutObjProcessor_Aio, which is called after each buffer is passed to handle_data(). If handle_data() results in a write to rados, a 'void *handle' for the AioCompletion is returned by handle_data(), then passed back to throttle_data() where it's registered as 'pending' and waited on if necessary. See put_data_and_throttle() in rgw_op.h [1] for a canonical example of this handle_data()-throttle_data() loop, which is duplicated in several other places.

This control flow became a bit more convoluted with the addition of the PutObj filters (to support compression in jewel, and encryption in luminous) which are stacked on top of the PutObjProcessor. Now this AioCompletion handle is being passed all the way up the stack, and then back down for throttle_data().

This model has several issues:

* If one call to a filter's handle_data() function generates multiple rados writes, only the final one will be returned all the way up the stack and passed back to throttle_data(). This is generally avoided with the 'bool *again' flag but it requires the application logic, ie put_data_and_throttle(), to keep passing the same buffer through handle_data()/throttle_data() until again==false. * Where compression is involved, the application is dealing with uncompressed buffer sizes, but we want to throttle based on the compressed size of rados writes instead. * Throttling is based on the size of the last bufferlist passed to handle_data() at the top of the stack. Some filters do internal buffering, and RGWPutObjProcessor_Atomic itself will buffer up data from multiple calls until we have rgw_max_chunk_size to write at once. So the final call to handle_data() may be much smaller, yet that's the size argument passed to throttle_data(). [2]

On the other hand, one potential advantage of this model is that the application can do some extra work between the calls handle_data() and throttle_data() for a small benefit to parallelism. The only thing that currently does this is fetch_remote_obj() for opstate tracking, but I believe this is obsolete and have a pr [3] to remove it.


I'd like to propose that we invert this control flow so that throttle_data() is called by RGWPutObjProcessor_Aio immediately after submitting each aio write to rados. That way any blocking happens at the bottom of the stack before returning. Not only does that address the issues listed above, but it also prevents the AioCompletion-based implementation details from leaking into these interfaces. That in turn will make it easier to plug in a different strategy for use with beast, which will likely combine AioCompletion callbacks with asio-style asynchronous waits. And if another case arises where we want to perform some extra work before throttling, we could accomplish that by passing some kind of callback interface into RGWPutObjProcessor_Aio.

Any feedback/objections/alternatives?

Thanks,
Casey

[1] https://github.com/ceph/ceph/blob/e03d228ab08049ba3b7fc64533d299868640cf17/src/rgw/rgw_op.h#L1859-L1886
[2] http://tracker.ceph.com/issues/24594
[3] https://github.com/ceph/ceph/pull/24059



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux