I'm experiencing problem with poor performance of RadosGW while operating on bucket with many object. That's known issue with LevelDB and can be partially resolved using shrading but I have one more idea. As I see in ceph osd logs all slow requests are while making call to rgw.bucket_list:
2016-02-17 03:17:56.846694 7f5396f63700 0 log_channel(cluster) log [WRN] : slow request 30.272904 seconds old, received at 2016-02-17 03:17:26.573742: osd_op(client.12611484.0:15137332 .dir.default.4162.3 [call rgw.bucket_list] 9.2955279 ack+read+known_if_redirected e3252) currently started
I don't know exactly how Ceph internally works but maybe data required to return results for rgw.bucket_list could be cached for some time. Cache TTL would be parametrized and could be disabled to keep the same behaviour as current one. There can be 3 cases when there's a call to rgw.bucket_list:
1. no cached data
2. up-to-date cache
3. outdated cache
Ad 1. First call starts generating full list. All new requests are put on hold. When list is ready it's saved to cache
Ad 2. All calls are served from cache
Ad 3. First request starts generating full list. All new requests are served from outdated cache until new cached data is ready
This can be even optimized by periodically generating fresh cache, even if it's not expired yet to reduce cases when cache is outdated.
Maybe this idea is stupid, maybe not, but if it's doable it would be nice to have choice.
Kind regards -
Krzysztof Księżyk
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com