Single key delete performance against increasing bucket size

"Robin H. Johnson" <robbat2@xxxxxxxxxx> · Thu, 17 Mar 2016 00:03:32 +0000

On Wed, Mar 16, 2016 at 06:36:33AM +0000, Pavan Rallabhandi wrote:
> I find this to be discussed here before, but couldn¹t find any solution
> hence the mail. In RGW, for a bucket holding objects in the range of ~
> millions, one can find it to take for ever to delete the bucket(via
> radosgw-admin). I understand the gc(and its parameters) that would reclaim
> the space eventually, but am looking more at the bucket deletion options
> that can possibly speed up the operation.
This ties well into a mail I had sitting in my drafts, but never got
around to sending.

Whilst doing some rough benchmarking on bucket index sharding, I ran
into some terrible performance for key deletion on non-existent keys.

Shards did NOT alleviate this performance issue, but did help elsewhere.
Numbers given below are for unsharded buckets; relatively empty buckets
perform worse when shards before performance picks up again.

Test methodology:
- Fire single DELETE key ops to the RGW; not using multi-object delete. 
- I measured the time taken for each delete, and report it here for the
  99% percentile (1% of operations took longer than this). 
- I took at least 1K samples for #keys up to and including 10k keys per
  bucket. For 50k keys/bucket I capped it to the first 100 samples
  instead of waiting 10 hours for the run to complete.
- The DELETE operations were run single-threaded, with no concurrency.

Test environments:
Clusters are were both running Hammer 0.94.5 on Ubuntu precise; the
hardware is a long way from being new; there are no SSDs, the journal is
the first partition on each OSD's disk. The test source host was
unloaded, and approx 1ms of latency away from the RGWs.

Cluster 1 (Congress, ~1350 OSDs; production cluster; haproxy of 10 RGWs)
#keys-in-bucket		time per single key delete
0					    6.899ms
10					    7.507ms
100					   13.573ms
1000				  327.936ms
10000				 4825.597ms
50000				33802.497ms
100000				did-not-finish

Cluster 2 (Benjamin, ~50 OSDs; test cluster, practically idle; haproxy of 2 RGWs)
#keys-in-bucket		time per single key delete
0					    4.825ms
10					    6.749ms
100					    6.146ms
1000				    6.816ms
10000				 1233.727ms
50000				64262.764ms
100000				did-not-finish

The cases marked with did-not-finish are where the RGW seems to time out
the operation even with the client having an unlimited timeout. It did
occur also connected directly to CivetWeb and not HAProxy.

I'm not sure why the 100-keys case on the second cluster seems to have
been faster than the 10-key case, but I'm willing to put it down to
statistical noise.

The huge increase at the end, and the operation not returning over 100k
items is concerning.

-- 
Robin Hugh Johnson
Gentoo Linux: Developer, Infrastructure Lead, Foundation Trustee
E-Mail     : robbat2@xxxxxxxxxx
GnuPG FP   : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com