Re: RadosGW performance s3 many objects

Василий Ангапов <angapov@xxxxxxxxx> · Sun, 12 Jun 2016 15:33:37 +0300

Wade,

I'm having the same problem as you do. We have currently 5+ million
objects in a bucket and it is not even sharded, so we observe many
problems with that. Did you manage to test RGW with tons of files?

2016-05-24 2:45 GMT+03:00 Wade Holler <wade.holler@xxxxxxxxx>:
> We (my customer ) are trying to test at Jewell now but I can say that the
> above behavior was also observed by my customer at Infernalis. After 300
> million or so objects in a single bucket the cluster basically fell down as
> described above. Few hundred osds in this cluster. We are concerned that
> this may not be remedied by a hundreds of buckets approach as well. Testing
> continues.
> On Mon, May 23, 2016 at 7:35 PM Vickey Singh <vickey.singh22693@xxxxxxxxx>
> wrote:
>>
>> Hello Guys
>>
>> Is several millions of object with Ceph ( for RGW use case ) still an
>> issue ?  Or is it fixed ?
>>
>> Thnx
>> Vickey
>>
>> On Thu, Jan 28, 2016 at 12:55 AM, Krzysztof Księżyk <kksiezyk@xxxxxxxxx>
>> wrote:
>>>
>>> Stefan Rogge <stefan.ceph@...> writes:
>>>
>>> >
>>> >
>>> > Hi,
>>> > we are using the Ceph with RadosGW and S3 setting.
>>> > With more and more objects in the storage the writing speed slows down
>>> significantly. With 5 million object in the storage we had a writing
>>> speed
>>> of 10MS/s. With 10 million objects in the storage its only 5MB/s.
>>> > Is this a common issue?
>>> > Is the RadosGW suitable for a large amount of objects or would you
>>> recommend to not use the RadosGW with these amount of objects?
>>> >
>>> > Thank you.
>>> >
>>> > Stefan
>>> >
>>> > I found also a ticket at the ceph tracker with the same issue:
>>> >
>>> >
>>> > http://tracker.ceph.com/projects/ceph/wiki/Rgw_-_bucket_index_scalability
>>> >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > ceph-users@...
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>>
>>> Hi,
>>>
>>> I'm struggling with the same issue on Ceph 9.2.0. Unfortunately I wasn't
>>> aware of it and now the only way to improve things is create new bucket
>>> with bucket index shrading or change way our apps store data into
>>> buckets.
>>> And of course copy tons of data :( In my case also sth happened to
>>> leveldb
>>> files and now I cannot even run some radosgw-admin commands like:
>>>
>>> radosgw-admin bucket check -b ....
>>>
>>> what causes osd daemon flapping and process timeout messages in logs. PGS
>>> containing  .rgw.bucket.index  can't be even backfilled to other osd as
>>> osd
>>> process dies with messages:
>>>
>>> [...]
>>> > 2016-01-25 15:47:22.700737 7f79fc66d700  1 heartbeat_map is_healthy
>>> 'OSD::osd_op_tp thread 0x7f7992c86700' had suicide timed out after 150
>>> > 2016-01-25 15:47:22.702619 7f79fc66d700 -1 common/HeartbeatMap.cc: In
>>> function 'bool ceph::HeartbeatMap::_check(const
>>> ceph::heartbeat_handle_d*,
>>> const char*, time_t)' thread 7f79fc66d700 time 2016-01-25 15:47:22.700751
>>> > common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")
>>> >
>>> >  ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
>>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x85) [0x7f7a019f4be5]
>>> >  2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char
>>> const*, long)+0x2d9) [0x7f7a019343b9]
>>> >  3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0x7f7a01934bf6]
>>> >  4: (ceph::HeartbeatMap::check_touch_file()+0x2c) [0x7f7a019353bc]
>>> >  5: (CephContextServiceThread::entry()+0x15b) [0x7f7a01a10dcb]
>>> >  6: (()+0x7df5) [0x7f79ffa8fdf5]
>>> >  7: (clone()+0x6d) [0x7f79fe3381ad]
>>> >
>>> >
>>> I don't know - maybe it's because number of leveldb files in omap folder
>>> (total 5.1GB). Read somewhere that things can be improved by setting
>>> 'leveldb_compression' to false and leveldb_compact_on_mount to true but I
>>> don't know if these options have any effect in 9.2.0 as they are not
>>> documented for this release. Tried with 'leveldb_compression' but without
>>> visible effect and wasn't brave enough with trying
>>> leveldb_compact_on_mount
>>> on production env. But setting it to true on my test 0.94.5 makes osd
>>> failing on restart.
>>>
>>> Kind regards -
>>> Krzysztof Księżyk
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com