Bad performance while deleting many small objects via radosgw S3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

 

Our little dev ceph cluster (nothing fancy; 3x1 OSD with 100GB each, 3x monitor with radosgw) takes over 20 minutes to delete ca. 44000 small objects (<1GB in total).

Deletion is done by listing objects in blocks of 1000 and then deleting them in one call for each block; each deletion of 1000 objects takes ca. 45s.

 

The monitor/radosgw hosts have a load of 0.03, the OSD hosts have only ca. 25% CPU usage, and ca. 5-10% iowait.

 

So nothing is really looking like a bottleneck.

 

Any Ideas on how to speed this up massively?

 

Pools:

 

# ceph osd pool ls  detail

pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 557 flags hashpspool stripe_width 0

pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 558 flags hashpspool stripe_width 0

pool 2 'default.rgw.control' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 559 flags hashpspool stripe_width 0

pool 3 'default.rgw.data.root' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 560 flags hashpspool stripe_width 0

pool 4 'default.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 561 flags hashpspool stripe_width 0

pool 5 'default.rgw.log' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 562 flags hashpspool stripe_width 0

pool 6 'default.rgw.users.uid' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 563 flags hashpspool stripe_width 0

pool 7 'default.rgw.users.keys' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 564 flags hashpspool stripe_width 0

pool 8 'default.rgw.meta' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 565 flags hashpspool stripe_width 0

pool 9 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 566 flags hashpspool stripe_width 0

pool 10 'default.rgw.buckets.data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 200 pgp_num 200 last_change 567 flags hashpspool stripe_width 0

 

Config:

 

[global]

fsid = cfaf0f4e-3b09-49e8-875b-4b114b0c4842

public_network = 0.0.0.0/0

mon_initial_members = ceph-kl-mon1

mon_host = 10.12.83.229, 10.12.81.212, 10.12.83.6

auth_cluster_required = cephx

auth_service_required = cephx

auth_client_required = cephx

filestore_xattr_use_omap = true

rgw zonegroup root pool = .rgw.root

osd pool default size = 2

osd pool default min size = 2

osd pool default pg num = 200

osd pool default pgp num = 200

mon_pg_warn_max_per_osd = 0

mon pg warn max object skew = 0

 

[osd]

osd op threads = 8

osd disk threads = 8

osd op queue = prio

osd recovery max active = 32

osd recovery threads = 4

 

[client.radosgw]

rgw zone = default

rgw zone root pool = .rgw.root

keyring = /etc/ceph/ceph.client.radosgw.keyring

rgw socket path = /var/run/ceph/ceph.radosgw.gateway.fastcgi.sock

log file = /var/log/radosgw/client.radosgw.gateway.log

rgw print continue = false

rgw cache enabled = true

rgw cache lru size = 50000

rgw num rados handles = 50

rgw num control oids = 16

rgw gc max objs = 1000

rgw exit timeout secs = 300

 

[client.radosgw.ceph-kl-mon1]

host = ceph-kl-mon1

rgw cache enabled = true

rgw cache lru size = 50000

rgw num rados handles = 50

rgw num control oids = 16

rgw gc max objs = 1000

rgw exit timeout secs = 300

 

[client.radosgw.ceph-kl-mon2]

host = ceph-kl-mon2

rgw cache enabled = true

rgw cache lru size = 50000

rgw num rados handles = 50

rgw num control oids = 16

rgw gc max objs = 1000

rgw exit timeout secs = 300

 

[client.radosgw.ceph-kl-mon3]

host = ceph-kl-mon3

rgw cache enabled = true

rgw cache lru size = 50000

rgw num rados handles = 50

rgw num control oids = 16

rgw gc max objs = 1000

rgw exit timeout secs = 300

 

 

As you see, I already tried some tweaks to the radosgw config, but no positive effect.

 

Or is radosgw just not designed for this load (lots of really small objects)?

 

Thanks

 

Martin

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux