Re: Dealing with radosgw and large OSD LevelDBs: compact, start over, something else?

Haomai Wang <haomaiwang@xxxxxxxxx> · Mon, 21 Dec 2015 22:36:34 +0800

resend

On Mon, Dec 21, 2015 at 10:35 PM, Haomai Wang <haomai@xxxxxxxx> wrote:

On Fri, Dec 18, 2015 at 1:16 AM, Florian Haas <florian@xxxxxxxxxxx> wrote:
Hey everyone,

I recently got my hands on a cluster that has been underperforming in

terms of radosgw throughput, averaging about 60 PUTs/s with 70K

objects where a freshly-installed cluster with near-identical

configuration would do about 250 PUTs/s. (Neither of these values are

what I'd consider high throughput, but this is just to give you a feel

about the relative performance hit.)

Some digging turned up that of the less than 200 buckets in the

cluster, about 40 held in excess of a million objects (1-4M), which

one bucket being an outlier with 45M objects. All buckets were created

post-Hammer, and use 64 index shards. The total number of objects in

radosgw is approx. 160M.

Now this isn't a large cluster in terms of OSD distribution; there are

only 12 OSDs (after all, we're only talking double-digit terabytes

here). In almost all of these OSDs, the LevelDB omap directory has

grown to a size of 10-20 GB.

So I have several questions on this:

- Is it correct to assume that such a large LevelDB would be quite

detrimental to radosgw performance overall?

- If so, would clearing that one large bucket and distributing the

data over several new buckets reduce the LevelDB size at all?

- Is there even something akin to "ceph mon compact" for OSDs?

- Are these large LevelDB databases a simple consequence of having a

combination of many radosgw objects and few OSDs, with the

distribution per-bucket being comparatively irrelevant?

I do understand that the 45M object bucket itself would have been a

problem pre-Hammer, with no index sharding available. But with what

others have shared here, a rule of thumb of one index shard per

million objects should be a good one to follow, so 64 shards for 45M

objects doesn't strike me as totally off the mark. That's why I think

LevelDB I/O is actually the issue here. But I might be totally wrong;

all insights appreciated. :)

Do you enable bucket index sharding?

I'm not sure your bottleneck regard to your cluster, I guess you could disable leveldb compression to test whether reduce compaction influence. You may check with perf top to see whether uncompress occur most of cpu times and io util isn't 100%.

Cheers,

Florian

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Best Regards,
Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com