Re: Idea for speedup RadosGW for buckets with many objects.

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Thu, 18 Feb 2016 07:34:06 -0800

On Wed, Feb 17, 2016 at 12:51 PM, Krzysztof Księżyk <kksiezyk@xxxxxxxxx> wrote:
> Hi,
>
> I'm experiencing problem with poor performance of RadosGW while operating on
> bucket with many object. That's known issue with LevelDB and can be
> partially resolved using shrading but I have one more idea. As I see in ceph
> osd logs all slow requests are while making call to rgw.bucket_list:
>
> 2016-02-17 03:17:56.846694 7f5396f63700  0 log_channel(cluster) log [WRN] :
> slow request 30.272904 seconds old, received at 2016-02-17 03:17:26.573742:
> osd_op(client.12611484.0:15137332 .dir.default.4162.3 [call rgw.bucket_list]
> 9.2955279 ack+read+known_if_redirected e3252) currently started
>
> I don't know exactly how Ceph internally works but maybe data required to
> return results for rgw.bucket_list could be cached for some time. Cache TTL
> would be parametrized and could be disabled to keep the same behaviour as
> current one. There can be 3 cases when there's a call to rgw.bucket_list:
> 1. no cached data
> 2. up-to-date cache
> 3. outdated cache
>
> Ad 1. First call starts generating full list. All new requests are put on
> hold. When list is ready it's saved to cache
> Ad 2. All calls are served from cache
> Ad 3. First request starts generating full list. All new requests are served
> from outdated cache until new cached data is ready
>
> This can be even optimized by periodically generating fresh cache, even if
> it's not expired yet to reduce cases when cache is outdated.

Where is the cache going to live in? Note that for it to be on rgw, it
will need to be shared among all rgw instances (serving the same
zone). On the other hand, I'm not exactly sure how the osd could cache
it (there's not mechanism at the moment that would allow that). And
the cache itself will need to be part of the osd that serves the
specific bucket index, otherwise you'd need to go to multiple osds for
that operation, which will slow down things for the general case.
Note that we need for things to be durable, otherwise we might end up
with inconsistencies when things don't go as expected (e.g., when rgw
/ osd went down).

We did some thinking recently around the bucket index area, see how
things can be improved. One way would be (for some use cases) to drop
it altogether. This could work in environment where 1. you don't need
to list objects in the bucket, and 2. no multi-zone sync. Another
possible mechanisms would be to relax the bucket index update, and
replace it with some kind of a lazy update (maybe similar to what you
suggested), and some way to rebuild the index out of the raw pool data
(maybe combining it with rados namespace).

>
> Maybe this idea is stupid, maybe not, but if it's doable it would be nice to
> have choice.

Thanks for the suggestions!

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com