Re: RGW Bucket unable to list buckets 100TB bucket

Oscar Tiderman <tiderman@xxxxxxxxxxx> · Fri, 10 May 2019 10:13:34 +0200

On 10/05/2019 08:42, EDH - Manuel Rios Fernandez wrote:
> Hi 
>
> Yesterday night we added 2 Intel Optane Nvme 
>
> Generated 4 partitions for get the max performance (Q=32) of those monsters, total 8 Partitions of 50GB.
>
> Move the rgw.index pool got filled near 3GB .
>
> And...
>
> Still the same issue, listing buckets its really slow or deeply slow that make its unable to common use when you need list.
>
> Im still don’t know how we can optimize it more.  Any suggestion/ideas? 
>
> Note: we also upgraded to ceph nautilus 14.2.1 for check if some fixes also help,
>
> With the new RGW now log include in debug level 2 a param "latency" :
> 2019-05-10 08:39:39.793 7f4587482700  1 ====== req done req=0x55e6163948e0 op status=0 http_status=200 latency=214.109s ======
> 2019-05-10 08:41:38.240 7f451ebb1700  1 ====== req done req=0x55e6163348e0 op status=0 http_status=200 latency=144.57s ======
>
> Sometimes it get 214 (seconds??)
>
> Best Regards,
>
> Manuel
>
>
> -----Mensaje original-----
> De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de EDH - Manuel Rios Fernandez
> Enviado el: sábado, 4 de mayo de 2019 15:53
> Para: 'Matt Benjamin' <mbenjami@xxxxxxxxxx>
> CC: 'ceph-users' <ceph-users@xxxxxxxxxxxxxx>
> Asunto: Re:  RGW Bucket unable to list buckets 100TB bucket
>
> Hi Folks,
>
> The user is telling us that their software drops a timeout at 10 min (secs)
>
> Reading documentation I think that we can set param  to 3600 secs as Amazon got it as timeout
>
> rgw op thread timeout 
>
> Description:	The timeout in seconds for open threads.
> Type:	Integer
> Default:	600
>
> Of course list a bucket with 7M objects is a painfull maybe this help to allow software complete the listing?
>
> Best Regards
> Manuel
>
> -----Mensaje original-----
> De: Matt Benjamin <mbenjami@xxxxxxxxxx> Enviado el: viernes, 3 de mayo de 2019 15:47
> Para: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx>
> CC: ceph-users <ceph-users@xxxxxxxxxxxxxx>
> Asunto: Re:  RGW Bucket unable to list buckets 100TB bucket
>
> I think I would not override the default value for "rgw list buckets max chunk", I have no experience doing that, though I can see why it might be plausible.
>
> Matt
>
> On Fri, May 3, 2019 at 9:39 AM EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> wrote:
>> From changes right know we got some other errors...
>>
>> 2019-05-03 15:37:28.604 7f499a2e8700  1 ====== starting new request
>> req=0x55f326692970 =====
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s::GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::initializin
>> g for trans_id = tx000000000000000005c63-005ccc4418-e76558-default
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::getting op
>> 0
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
>> fying requester
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:norm
>> alizing buckets and tenants
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:init
>> permissions
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:reca
>> lculating target
>> 2019-05-03 15:37:28.604 7f499a2e8700  2 req 23651:0s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:read
>> ing permissions
>> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:init
>> op
>> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
>> fying op mask
>> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
>> fying op permissions
>> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:veri
>> fying op params
>> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:pre-
>> executing
>> 2019-05-03 15:37:28.607 7f499a2e8700  2 req 23651:0.003s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:exec
>> uting
>> 2019-05-03 15:37:28.958 7f4a68484700 -1 write_data failed: Connection 
>> reset by peer
>> 2019-05-03 15:37:28.959 7f4a68484700  0 ERROR: flush_read_list(): 
>> d->client_cb->handle_data() returned -104
>> 2019-05-03 15:37:28.959 7f4a68484700  2 req 23574:41.87s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:comp
>> leting
>> 2019-05-03 15:37:28.959 7f4a68484700  2 req 23574:41.87s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:op
>> status=-104
>> 2019-05-03 15:37:28.959 7f4a68484700  2 req 23574:41.87s:s3:GET 
>> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIma
>> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a-a
>> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:http
>> status=206
>> 2019-05-03 15:37:28.959 7f4a68484700  1 ====== req done
>> req=0x55f2fde20970 op status=-104 http_status=206 ======
>>
>>
>> -----Mensaje original-----
>> De: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> Enviado
>> el: viernes, 3 de mayo de 2019 15:12
>> Para: 'Matt Benjamin' <mbenjami@xxxxxxxxxx>
>> CC: 'ceph-users' <ceph-users@xxxxxxxxxxxxxx>
>> Asunto: RE:  RGW Bucket unable to list buckets 100TB 
>> bucket
>>
>> Hi Matt,
>>
>> Thanks for your help,
>>
>> We have done the changes plus a reboot of MONs and RGW they look like strange stucked , now we're able to list  250 directories.
>>
>> time s3cmd ls s3://datos101 --no-ssl --limit 150
>> real    2m50.854s
>> user    0m0.147s
>> sys     0m0.042s
>>
>>
>> Is there any recommendation of max_shard ?
>>
>> Our main goal is cold storage, normally our usage are backups or customers tons of files. This cause that customers in single bucket store millions objetcs.
>>
>> Its strange because this issue started on Friday without any warning error at OSD / RGW logs.
>>
>> When you should warning customer that will not be able to list their directory if they reach X Millions objetcs?
>>
>> Our current ceph.conf
>>
>> #Normal-Memory 1/5
>> debug rgw = 2
>> #Disable
>> debug osd = 0
>> debug journal = 0
>> debug ms = 0
>>
>> fsid = e1ee8086-7cce-43fd-a252-3d677af22428
>> mon_initial_members = CEPH001, CEPH002, CEPH003 mon_host =
>> 172.16.2.10,172.16.2.11,172.16.2.12
>> auth_cluster_required = cephx
>> auth_service_required = cephx
>> auth_client_required = cephx
>> osd pool default pg num = 128
>> osd pool default pgp num = 128
>>
>> public network = 172.16.2.0/24
>> cluster network = 172.16.1.0/24
>>
>> osd pool default size = 2
>> osd pool default min size = 1
>>
>> rgw_dynamic_resharding = true
>> #Increment to 128
>> rgw_override_bucket_index_max_shards = 128
>>
>> #Default: 1000
>> rgw list buckets max chunk = 5000
>>
>>
>>
>> [osd]
>> osd mkfs type = xfs
>> osd op threads = 12
>> osd disk threads = 12
>>
>> osd recovery threads = 4
>> osd recovery op priority = 1
>> osd recovery max active = 2
>> osd recovery max single start = 1
>>
>> osd max backfills = 4
>> osd backfill scan max = 16
>> osd backfill scan min = 4
>> osd client op priority = 63
>>
>>
>> osd_memory_target = 2147483648
>>
>> osd_scrub_begin_hour = 23
>> osd_scrub_end_hour = 6
>> osd_scrub_load_threshold = 0.25 #low load scrubbing 
>> osd_scrub_during_recovery = false #scrub during recovery
>>
>> [mon]
>>     mon allow pool delete = true
>>     mon osd min down reporters = 3
>>
>> [mon.a]
>>     host = CEPH001
>>     public bind addr = 172.16.2.10
>>     mon addr = 172.16.2.10:6789
>>     mon allow pool delete = true
>>
>> [mon.b]
>>     host = CEPH002
>>     public bind addr = 172.16.2.11
>>     mon addr = 172.16.2.11:6789
>>     mon allow pool delete = true
>>
>> [mon.c]
>>     host = CEPH003
>>     public bind addr = 172.16.2.12
>>     mon addr = 172.16.2.12:6789
>>     mon allow pool delete = true
>>
>> [client.rgw]
>>  rgw enable usage log = true
>>
>>
>> [client.rgw.ceph-rgw01]
>>  host = ceph-rgw01
>>  rgw enable usage log = true
>>  rgw dns name =
>>  rgw frontends = "beast port=7480"
>>  rgw resolve cname = false
>>  rgw thread pool size = 512
>>  rgw num rados handles = 1
>>  rgw op thread timeout = 600
>>
>>
>> [client.rgw.ceph-rgw03]
>>  host = ceph-rgw03
>>  rgw enable usage log = true
>>  rgw dns name =
>>  rgw frontends = "beast port=7480"
>>  rgw resolve cname = false
>>  rgw thread pool size = 512
>>  rgw num rados handles = 1
>>  rgw op thread timeout = 600
>>
>>
>> On Thursday customer tell us that listing were instant, and now their programs delay until timeout.
>>
>> Best Regards
>>
>> Manuel
>>
>> -----Mensaje original-----
>> De: Matt Benjamin <mbenjami@xxxxxxxxxx> Enviado el: viernes, 3 de mayo 
>> de 2019 14:00
>> Para: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx>
>> CC: ceph-users <ceph-users@xxxxxxxxxxxxxx>
>> Asunto: Re:  RGW Bucket unable to list buckets 100TB 
>> bucket
>>
>> Hi Folks,
>>
>> Thanks for sharing your ceph.conf along with the behavior.
>>
>> There are some odd things there.
>>
>> 1. rgw_num_rados_handles is deprecated--it should be 1 (the default), 
>> but changing it may require you to check and retune the values for 
>> objecter_inflight_ops and objecter_inflight_op_bytes to be larger 2.
>> you have very different rgw_thread_pool_size values on these to 
>> gateways;  a value between 512 and 1024 is usually best (future rgws 
>> will not rely on large thread pools) 3. the actual behavior with 128 
>> shards might be assisted by listing in unordered mode--HOWEVER, there 
>> was a bug in this feature which caused a perf regression and masked 
>> the benefit--make sure you have applied the fix for
>> https://tracker.ceph.com/issues/39393 before evaluating
>>
>> regards,
>>
>> Matt
>>
>> On Fri, May 3, 2019 at 4:57 AM EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> wrote:
>>> Hi,
>>>
>>>
>>>
>>> We got a ceph deployment 13.2.5 version, but several bucket with millions of files.
>>>
>>>
>>>
>>>   services:
>>>
>>>     mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003
>>>
>>>     mgr: CEPH001(active)
>>>
>>>     osd: 106 osds: 106 up, 106 in
>>>
>>>     rgw: 2 daemons active
>>>
>>>
>>>
>>>   data:
>>>
>>>     pools:   17 pools, 7120 pgs
>>>
>>>     objects: 106.8 M objects, 271 TiB
>>>
>>>     usage:   516 TiB used, 102 TiB / 619 TiB avail
>>>
>>>     pgs:     7120 active+clean
>>>
>>>
>>>
>>> We done a test in a spare RGW server for this case.
>>>
>>>
>>>
>>>
>>>
>>> Customer report us that is unable to list their buckets, we tested in a monitor with the command:
>>>
>>>
>>>
>>> s3cmd ls s3://[bucket] --no-ssl --limit 20
>>>
>>>
>>>
>>> Takes 1m and 2 secs.
>>>
>>>
>>>
>>> RGW log in debug mode = 2
>>>
>>>
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  1 ====== starting new request
>>> req=0x55eba26e8970 =====
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s::GET 
>>> /[bucketname]/::initializing for trans_id = 
>>> tx000000000000000000071-005ccbfe79-e6283e-default
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/::getting op 0
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying requester
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:normalizing buckets and tenants
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:init permissions
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:recalculating target
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:reading permissions
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:init op
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying op mask
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying op permissions
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying op params
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:pre-executing
>>>
>>> 2019-05-03 10:40:25.449 7f65f63e1700  2 req 113:0s:s3:GET 
>>> /[bucketname]/:list_bucket:executing
>>>
>>> 2019-05-03 10:40:41.026 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:41:03.026 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:41:25.026 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:41:47.026 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:41:49.395 7f65f63e1700  2 req 113:83.9461s:s3:GET 
>>> /[bucketname]/:list_bucket:completing
>>>
>>> 2019-05-03 10:41:49.395 7f65f63e1700  2 req 113:83.9461s:s3:GET 
>>> /[bucketname]/:list_bucket:op status=0
>>>
>>> 2019-05-03 10:41:49.395 7f65f63e1700  2 req 113:83.9461s:s3:GET 
>>> /[bucketname]/:list_bucket:http status=200
>>>
>>> 2019-05-03 10:41:49.395 7f65f63e1700  1 ====== req done
>>> req=0x55eba26e8970 op status=0 http_status=200 ======
>>>
>>>
>>>
>>>
>>>
>>> time s3cmd ls s3://[bucket] --no-ssl --limit 100
>>>
>>> real    4m26.318s
>>>
>>>
>>>
>>>
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  1 ====== starting new request
>>> req=0x55eba26e8970 =====
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s::GET 
>>> /[bucketname]/::initializing for trans_id = 
>>> tx000000000000000000073-005ccbfefc-e6283e-default
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/::getting op 0
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying requester
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:normalizing buckets and tenants
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:init permissions
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:recalculating target
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:reading permissions
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:init op
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying op mask
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying op permissions
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:verifying op params
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:pre-executing
>>>
>>> 2019-05-03 10:42:36.439 7f65f33db700  2 req 115:0s:s3:GET 
>>> /[bucketname]/:list_bucket:executing
>>>
>>> 2019-05-03 10:42:53.026 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:43:15.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:43:37.028 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:43:59.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:44:21.028 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:44:43.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:45:05.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:45:18.260 7f660cc0e700  2 object expiration: start
>>>
>>> 2019-05-03 10:45:18.779 7f660cc0e700  2 object expiration: stop
>>>
>>> 2019-05-03 10:45:27.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:45:49.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:46:11.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:46:33.027 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:46:55.028 7f660e411700  2
>>> RGWDataChangesLog::ChangesRenewThread: start
>>>
>>> 2019-05-03 10:47:02.092 7f65f33db700  2 req 115:265.652s:s3:GET 
>>> /[bucketname]/:list_bucket:completing
>>>
>>> 2019-05-03 10:47:02.092 7f65f33db700  2 req 115:265.652s:s3:GET 
>>> /[bucketname]/:list_bucket:op status=0
>>>
>>> 2019-05-03 10:47:02.092 7f65f33db700  2 req 115:265.652s:s3:GET 
>>> /[bucketname]/:list_bucket:http status=200
>>>
>>> 2019-05-03 10:47:02.092 7f65f33db700  1 ====== req done
>>> req=0x55eba26e8970 op status=0 http_status=200 ======
>>>
>>>
>>>
>>>
>>>
>>> radosgw-admin bucket limit check
>>>
>>>              }
>>>
>>>                 "bucket": "[BUCKETNAME]",
>>>
>>>                 "tenant": "",
>>>
>>>                 "num_objects": 7126133,
>>>
>>>                 "num_shards": 128,
>>>
>>>                 "objects_per_shard": 55672,
>>>
>>>                 "fill_status": "OK"
>>>
>>>             },
>>>
>>>
>>>
>>>
>>>
>>> We ‘realy don’t know who to solve that , looks like a timeout or slow performance for that bucket.
>>>
>>>
>>>
>>> Our RGW section in ceph.conf
>>>
>>>
>>>
>>> [client.rgw.ceph-rgw01]
>>>
>>> host = ceph-rgw01
>>>
>>> rgw enable usage log = true
>>>
>>> rgw dns name = XXXXXX
>>>
>>> rgw frontends = "beast port=7480"
>>>
>>> rgw resolve cname = false
>>>
>>> rgw thread pool size = 128
>>>
>>> rgw num rados handles = 1
>>>
>>> rgw op thread timeout = 120
>>>
>>>
>>>
>>>
>>>
>>> [client.rgw.ceph-rgw03]
>>>
>>> host = ceph-rgw03
>>>
>>> rgw enable usage log = true
>>>
>>> rgw dns name = XXXXXXXX
>>>
>>> rgw frontends = "beast port=7480"
>>>
>>> rgw resolve cname = false
>>>
>>> rgw thread pool size = 640
>>>
>>> rgw num rados handles = 16
>>>
>>> rgw op thread timeout = 120
>>>
>>>
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Manuel
>>>
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> --
>>
>> Matt Benjamin
>> Red Hat, Inc.
>> 315 West Huron Street, Suite 140A
>> Ann Arbor, Michigan 48103
>>
>> http://www.redhat.com/en/technologies/storage
>>
>> tel.  734-821-5101
>> fax.  734-769-8938
>> cel.  734-216-5309
>>
>
Hi,

Not sure at all if this would make a difference in your case, but I
recall us having some trouble with the Beast frontend performance a
while back, not sure what caused it. Perhaps you could try the Civetweb
frontend, just to see if that makes a difference.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com