Re: RadosGW performance s3 many objects

Vickey Singh <vickey.singh22693@xxxxxxxxx> · Tue, 24 May 2016 02:34:48 +0300

Hello Guys 
Is several millions of object with Ceph ( for RGW use case ) still an issue ?  Or is it fixed ?

Thnx
Vickey

On Thu, Jan 28, 2016 at 12:55 AM, Krzysztof Księżyk <kksiezyk@xxxxxxxxx> wrote:
Stefan Rogge <stefan.ceph@...> writes:

>

>

> Hi,

> we are using the Ceph with RadosGW and S3 setting.

> With more and more objects in the storage the writing speed slows down

significantly. With 5 million object in the storage we had a writing speed

of 10MS/s. With 10 million objects in the storage its only 5MB/s.  

> Is this a common issue?

> Is the RadosGW suitable for a large amount of objects or would you

recommend to not use the RadosGW with these amount of objects?

>

> Thank you.

>

> Stefan

>

> I found also a ticket at the ceph tracker with the same issue:

>

> http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@...

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

Hi,

I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't

aware of it and now the only way to improve things is create new bucket

with bucket index shrading or change way our apps store data into buckets.

And of course copy tons of data :( In my case also sth happened to leveldb

files and now I cannot even run some radosgw-admin commands like:

radosgw-admin bucket check -b ....

what causes osd daemon flapping and process timeout messages in logs. PGS

containing  .rgw.bucket.index  can't be even backfilled to other osd as osd

process dies with messages:

[...]

> 2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy

'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150

> 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In

function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*,

const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751

> common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")

>

>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)

>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char

const*)+0x85) [0x7f7a019f4be5]

>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char

const*, long)+0x2d9) [0x7f7a019343b9]

>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]

>  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]

>  5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]

>  6: (()+0x7df5) [0x7f79ffa8fdf5]

>  7: (clone()+0x6d) [0x7f79fe3381ad]

>

>

I don't know - maybe it's because number of leveldb files in omap folder

(total 5.1GB). Read somewhere that things can be improved by setting

'leveldb_compression' to false and leveldb_compact_on_mount to true but I

don't know if these options have any effect in 9.2.0 as they are not

documented for this release. Tried with 'leveldb_compression' but without

visible effect and wasn't brave enough with trying leveldb_compact_on_mount

on production env. But setting it to true on my test 0.94.5 makes osd

failing on restart.

Kind regards -

Krzysztof Księżyk

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com