Re: RadosGW performance s3 many objects

Krzysztof Księżyk <kksiezyk@xxxxxxxxx> · Wed, 27 Jan 2016 23:27:39 +0100

On Sun, 2016-01-24 at 13:44 +0100, Stefan Rogge wrote: 

    Hi,

    we are using the Ceph with RadosGW and S3 setting.

    With more and more objects in the storage the writing speed slows down significantly. With 5 million object in the storage we had a writing speed of 10MS/s. With 10 million objects in the storage its only 5MB/s.  

    Is this a common issue?

    Is the RadosGW suitable for a large amount of objects or would you recommend to not use the RadosGW with these amount of objects?

    Thank you.

    Stefan

    I found also a ticket at the ceph tracker with the same issue:

    http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Hi,

I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't aware of it and now the only way to improve things is create new bucket with bucket index shrading or change way our apps store data into buckets. And of course copy tons of data :( In my case also sth happened to leveldb files and now I cannot even run some radosgw-admin commands like:

radosgw-admin bucket check -b ....

what causes osd daemon flapping and process timeout messages in logs. PGS containing  .rgw.bucket.index  can't be even backfilled to other osd as osd process dies with messages:

[...]
2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150
2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751
common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")

 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f7a019f4be5]
 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, long)+0x2d9) [0x7f7a019343b9]
 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]
 4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]
 5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]
 6: (()+0x7df5) [0x7f79ffa8fdf5]
 7: (clone()+0x6d) [0x7f79fe3381ad]

I don't know - maybe it's because number of leveldb files in omap folder (total 5.1GB). Read somewhere that things can be improved by setting 'leveldb_compression' to false and leveldb_compact_on_mount to true but I don't know if these options have any effect in 9.2.0 as they are not documented for this release. Tried with 'leveldb_compression' but without visible effect and wasn't brave enough with trying leveldb_compact_on_mount on live. But setting it to true on my test 0.94.5 makes osd failing on restart.

Kind regards -

Krzysztof Księżyk 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com