Re: RadosGW performance s3 many objects

Krzysztof Księżyk <kksiezyk@xxxxxxxxx> · Wed, 27 Jan 2016 22:55:58 +0000 (UTC)

Stefan Rogge <stefan.ceph@...> writes:

> 
> 
> Hi,
> we are using the Ceph with RadosGW and S3 setting.
> With more and more objects in the storage the writing speed slows down 
significantly. With 5 million object in the storage we had a writing speed 
of 10MS/s. With 10 million objects in the storage its only 5MB/s.  
> Is this a common issue?
> Is the RadosGW suitable for a large amount of objects or would you 
recommend to not use the RadosGW with these amount of objects?
> 
> Thank you.
> 
> Stefan
> 
> I found also a ticket at the ceph tracker with the same issue:
> 
> http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability 		 	   		  
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@...
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Hi,

I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't 
aware of it and now the only way to improve things is create new bucket 
with bucket index shrading or change way our apps store data into buckets. 
And of course copy tons of data :( In my case also sth happened to leveldb 
files and now I cannot even run some radosgw-admin commands like:

radosgw-admin bucket check -b ....

what causes osd daemon flapping and process timeout messages in logs. PGS 
containing  .rgw.bucket.index  can't be even backfilled to other osd as osd 
process dies with messages:

[...]
> 2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150
> 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In 
function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, 
const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751
> common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")
> 
>  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x85) [0x7f7a019f4be5]
>  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, long)+0x2d9) [0x7f7a019343b9]
>  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]
>  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]
>  5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]
>  6: (()+0x7df5) [0x7f79ffa8fdf5]
>  7: (clone()+0x6d) [0x7f79fe3381ad]
> 
> 
I don't know - maybe it's because number of leveldb files in omap folder 
(total 5.1GB). Read somewhere that things can be improved by setting 
'leveldb_compression' to false and leveldb_compact_on_mount to true but I 
don't know if these options have any effect in 9.2.0 as they are not 
documented for this release. Tried with 'leveldb_compression' but without 
visible effect and wasn't brave enough with trying leveldb_compact_on_mount 
on production env. But setting it to true on my test 0.94.5 makes osd 
failing on restart.

Kind regards -
Krzysztof Księżyk

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com