Hi Oscar, We also come back to civetweb , all beast frontend got huge memory load 98% of host memory (64GB) and don’t accepting traffic until reboot daemons looks unstable right now , at least in 14.2.1 Now all RGW (civetweb) consumes near 6GB RAM each one. Also we continue trying to find why our ceph RGW listing become so poor. We found that 100% of HEAD petition at RGW got a 404 response. Is that ok? Example: 2019-05-19 07:37:22.756 7f5fc5855700 1 civetweb: 0x55c84595b618: 172.16.2.8 - - [19/May/2019:07:37:22 +0200] "HEAD /Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 2019-05-19 07:37:27.490 7f5fcd064700 1 civetweb: 0x55c845952270: 172.16.2.8 - - [19/May/2019:07:37:27 +0200] "HEAD /ControlGroup/MBS-e9045ebd-3174-46d4-9ecf-5c2e572a89b5/CBB_SERVERL/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001%24/20190505010049/41.cbrevision HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 2019-05-19 07:37:32.488 7f5fca85f700 1 civetweb: 0x55c8459553a8: 172.16.2.8 - - [19/May/2019:07:37:32 +0200] "HEAD /Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 2019-05-19 07:37:38.987 7f5fd086b700 1 civetweb: 0x55c84594dd88: 172.16.2.8 - - [19/May/2019:07:37:38 +0200] "HEAD /Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 2019-05-19 07:37:41.178 7f5fd206e700 1 civetweb: 0x55c84594c000: 172.16.2.8 - - [19/May/2019:07:37:41 +0200] "HEAD /Infoself/MBS-ccb4da86-2b33-4291-ba8b-dd21d4b16e45/CBB_SRV-CONTROL/CBB_VM/192.168.7.247/Servidor%20pc1/Hard%20disk%201%24/20190518220111/85.cbrevision HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client 5.9.5 2019-05-19 07:37:51.993 7f5fcd865700 1 civetweb: 0x55c845951898: 172.16.2.8 - - [19/May/2019:07:37:51 +0200] "HEAD /ControlGroup/MBS-e9045ebd-3174-46d4-9ecf-5c2e572a89b5/CBB_SERVERL/CBB_DiskImage/Disk_00000000-0000-0000-0000-000000000000/Volume_NTFS_00000000-0000-0000-0000-000000000001%24/20190505010049/41.cbrevision HTTP/1.1" 404 225 - CloudBerryLab.Base.HttpUtil.Client Regards, Manuel -----Mensaje original----- De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de Oscar Tiderman Enviado el: viernes, 10 de mayo de 2019 10:14 Para: ceph-users@xxxxxxxxxxxxxx Asunto: Re: RGW Bucket unable to list buckets 100TB bucket On 10/05/2019 08:42, EDH - Manuel Rios Fernandez wrote: > Hi > > Yesterday night we added 2 Intel Optane Nvme > > Generated 4 partitions for get the max performance (Q=32) of those monsters, total 8 Partitions of 50GB. > > Move the rgw.index pool got filled near 3GB . > > And... > > Still the same issue, listing buckets its really slow or deeply slow that make its unable to common use when you need list. > > Im still don’t know how we can optimize it more. Any suggestion/ideas? > > Note: we also upgraded to ceph nautilus 14.2.1 for check if some fixes > also help, > > With the new RGW now log include in debug level 2 a param "latency" : > 2019-05-10 08:39:39.793 7f4587482700 1 ====== req done > req=0x55e6163948e0 op status=0 http_status=200 latency=214.109s ====== > 2019-05-10 08:41:38.240 7f451ebb1700 1 ====== req done > req=0x55e6163348e0 op status=0 http_status=200 latency=144.57s ====== > > Sometimes it get 214 (seconds??) > > Best Regards, > > Manuel > > > -----Mensaje original----- > De: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> En nombre de EDH - > Manuel Rios Fernandez Enviado el: sábado, 4 de mayo de 2019 15:53 > Para: 'Matt Benjamin' <mbenjami@xxxxxxxxxx> > CC: 'ceph-users' <ceph-users@xxxxxxxxxxxxxx> > Asunto: Re: RGW Bucket unable to list buckets 100TB > bucket > > Hi Folks, > > The user is telling us that their software drops a timeout at 10 min > (secs) > > Reading documentation I think that we can set param to 3600 secs as > Amazon got it as timeout > > rgw op thread timeout > > Description: The timeout in seconds for open threads. > Type: Integer > Default: 600 > > Of course list a bucket with 7M objects is a painfull maybe this help to allow software complete the listing? > > Best Regards > Manuel > > -----Mensaje original----- > De: Matt Benjamin <mbenjami@xxxxxxxxxx> Enviado el: viernes, 3 de mayo > de 2019 15:47 > Para: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> > CC: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Asunto: Re: RGW Bucket unable to list buckets 100TB > bucket > > I think I would not override the default value for "rgw list buckets max chunk", I have no experience doing that, though I can see why it might be plausible. > > Matt > > On Fri, May 3, 2019 at 9:39 AM EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> wrote: >> From changes right know we got some other errors... >> >> 2019-05-03 15:37:28.604 7f499a2e8700 1 ====== starting new request >> req=0x55f326692970 ===== >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s::GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::initializi >> n g for trans_id = tx000000000000000005c63-005ccc4418-e76558-default >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision::getting >> op >> 0 >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:ver >> i >> fying requester >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:nor >> m >> alizing buckets and tenants >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:ini >> t >> permissions >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:rec >> a >> lculating target >> 2019-05-03 15:37:28.604 7f499a2e8700 2 req 23651:0s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:rea >> d >> ing permissions >> 2019-05-03 15:37:28.607 7f499a2e8700 2 req 23651:0.003s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:ini >> t >> op >> 2019-05-03 15:37:28.607 7f499a2e8700 2 req 23651:0.003s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:ver >> i >> fying op mask >> 2019-05-03 15:37:28.607 7f499a2e8700 2 req 23651:0.003s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:ver >> i >> fying op permissions >> 2019-05-03 15:37:28.607 7f499a2e8700 2 req 23651:0.003s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:ver >> i >> fying op params >> 2019-05-03 15:37:28.607 7f499a2e8700 2 req 23651:0.003s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:pre >> - >> executing >> 2019-05-03 15:37:28.607 7f499a2e8700 2 req 23651:0.003s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:exe >> c >> uting >> 2019-05-03 15:37:28.958 7f4a68484700 -1 write_data failed: Connection >> reset by peer >> 2019-05-03 15:37:28.959 7f4a68484700 0 ERROR: flush_read_list(): >> d->client_cb->handle_data() returned -104 >> 2019-05-03 15:37:28.959 7f4a68484700 2 req 23574:41.87s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:com >> p >> leting >> 2019-05-03 15:37:28.959 7f4a68484700 2 req 23574:41.87s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:op >> status=-104 >> 2019-05-03 15:37:28.959 7f4a68484700 2 req 23574:41.87s:s3:GET >> /CBERRY/MBS-69e38e26-5516-4b83-864c-d2b05b9db5af/CBB_DFTDI/CBB_DiskIm >> a >> ge/Disk_011c3bdb-ac85-41f4-b727-c263001ba42f/Volume_Unknown_fbf0ea7a- >> a >> f96-4dd4-9ad5-dbf6efdeefdc%24/20190430074414/0.cbrevision:get_obj:htt >> p >> status=206 >> 2019-05-03 15:37:28.959 7f4a68484700 1 ====== req done >> req=0x55f2fde20970 op status=-104 http_status=206 ====== >> >> >> -----Mensaje original----- >> De: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> Enviado >> el: viernes, 3 de mayo de 2019 15:12 >> Para: 'Matt Benjamin' <mbenjami@xxxxxxxxxx> >> CC: 'ceph-users' <ceph-users@xxxxxxxxxxxxxx> >> Asunto: RE: RGW Bucket unable to list buckets 100TB >> bucket >> >> Hi Matt, >> >> Thanks for your help, >> >> We have done the changes plus a reboot of MONs and RGW they look like strange stucked , now we're able to list 250 directories. >> >> time s3cmd ls s3://datos101 --no-ssl --limit 150 >> real 2m50.854s >> user 0m0.147s >> sys 0m0.042s >> >> >> Is there any recommendation of max_shard ? >> >> Our main goal is cold storage, normally our usage are backups or customers tons of files. This cause that customers in single bucket store millions objetcs. >> >> Its strange because this issue started on Friday without any warning error at OSD / RGW logs. >> >> When you should warning customer that will not be able to list their directory if they reach X Millions objetcs? >> >> Our current ceph.conf >> >> #Normal-Memory 1/5 >> debug rgw = 2 >> #Disable >> debug osd = 0 >> debug journal = 0 >> debug ms = 0 >> >> fsid = e1ee8086-7cce-43fd-a252-3d677af22428 >> mon_initial_members = CEPH001, CEPH002, CEPH003 mon_host = >> 172.16.2.10,172.16.2.11,172.16.2.12 >> auth_cluster_required = cephx >> auth_service_required = cephx >> auth_client_required = cephx >> osd pool default pg num = 128 >> osd pool default pgp num = 128 >> >> public network = 172.16.2.0/24 >> cluster network = 172.16.1.0/24 >> >> osd pool default size = 2 >> osd pool default min size = 1 >> >> rgw_dynamic_resharding = true >> #Increment to 128 >> rgw_override_bucket_index_max_shards = 128 >> >> #Default: 1000 >> rgw list buckets max chunk = 5000 >> >> >> >> [osd] >> osd mkfs type = xfs >> osd op threads = 12 >> osd disk threads = 12 >> >> osd recovery threads = 4 >> osd recovery op priority = 1 >> osd recovery max active = 2 >> osd recovery max single start = 1 >> >> osd max backfills = 4 >> osd backfill scan max = 16 >> osd backfill scan min = 4 >> osd client op priority = 63 >> >> >> osd_memory_target = 2147483648 >> >> osd_scrub_begin_hour = 23 >> osd_scrub_end_hour = 6 >> osd_scrub_load_threshold = 0.25 #low load scrubbing >> osd_scrub_during_recovery = false #scrub during recovery >> >> [mon] >> mon allow pool delete = true >> mon osd min down reporters = 3 >> >> [mon.a] >> host = CEPH001 >> public bind addr = 172.16.2.10 >> mon addr = 172.16.2.10:6789 >> mon allow pool delete = true >> >> [mon.b] >> host = CEPH002 >> public bind addr = 172.16.2.11 >> mon addr = 172.16.2.11:6789 >> mon allow pool delete = true >> >> [mon.c] >> host = CEPH003 >> public bind addr = 172.16.2.12 >> mon addr = 172.16.2.12:6789 >> mon allow pool delete = true >> >> [client.rgw] >> rgw enable usage log = true >> >> >> [client.rgw.ceph-rgw01] >> host = ceph-rgw01 >> rgw enable usage log = true >> rgw dns name = >> rgw frontends = "beast port=7480" >> rgw resolve cname = false >> rgw thread pool size = 512 >> rgw num rados handles = 1 >> rgw op thread timeout = 600 >> >> >> [client.rgw.ceph-rgw03] >> host = ceph-rgw03 >> rgw enable usage log = true >> rgw dns name = >> rgw frontends = "beast port=7480" >> rgw resolve cname = false >> rgw thread pool size = 512 >> rgw num rados handles = 1 >> rgw op thread timeout = 600 >> >> >> On Thursday customer tell us that listing were instant, and now their programs delay until timeout. >> >> Best Regards >> >> Manuel >> >> -----Mensaje original----- >> De: Matt Benjamin <mbenjami@xxxxxxxxxx> Enviado el: viernes, 3 de >> mayo de 2019 14:00 >> Para: EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> >> CC: ceph-users <ceph-users@xxxxxxxxxxxxxx> >> Asunto: Re: RGW Bucket unable to list buckets 100TB >> bucket >> >> Hi Folks, >> >> Thanks for sharing your ceph.conf along with the behavior. >> >> There are some odd things there. >> >> 1. rgw_num_rados_handles is deprecated--it should be 1 (the default), >> but changing it may require you to check and retune the values for >> objecter_inflight_ops and objecter_inflight_op_bytes to be larger 2. >> you have very different rgw_thread_pool_size values on these to >> gateways; a value between 512 and 1024 is usually best (future rgws >> will not rely on large thread pools) 3. the actual behavior with 128 >> shards might be assisted by listing in unordered mode--HOWEVER, there >> was a bug in this feature which caused a perf regression and masked >> the benefit--make sure you have applied the fix for >> https://tracker.ceph.com/issues/39393 before evaluating >> >> regards, >> >> Matt >> >> On Fri, May 3, 2019 at 4:57 AM EDH - Manuel Rios Fernandez <mriosfer@xxxxxxxxxxxxxxxx> wrote: >>> Hi, >>> >>> >>> >>> We got a ceph deployment 13.2.5 version, but several bucket with millions of files. >>> >>> >>> >>> services: >>> >>> mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003 >>> >>> mgr: CEPH001(active) >>> >>> osd: 106 osds: 106 up, 106 in >>> >>> rgw: 2 daemons active >>> >>> >>> >>> data: >>> >>> pools: 17 pools, 7120 pgs >>> >>> objects: 106.8 M objects, 271 TiB >>> >>> usage: 516 TiB used, 102 TiB / 619 TiB avail >>> >>> pgs: 7120 active+clean >>> >>> >>> >>> We done a test in a spare RGW server for this case. >>> >>> >>> >>> >>> >>> Customer report us that is unable to list their buckets, we tested in a monitor with the command: >>> >>> >>> >>> s3cmd ls s3://[bucket] --no-ssl --limit 20 >>> >>> >>> >>> Takes 1m and 2 secs. >>> >>> >>> >>> RGW log in debug mode = 2 >>> >>> >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 1 ====== starting new request >>> req=0x55eba26e8970 ===== >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s::GET >>> /[bucketname]/::initializing for trans_id = >>> tx000000000000000000071-005ccbfe79-e6283e-default >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/::getting op 0 >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying requester >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:normalizing buckets and tenants >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:init permissions >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:recalculating target >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:reading permissions >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:init op >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying op mask >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying op permissions >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying op params >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:pre-executing >>> >>> 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET >>> /[bucketname]/:list_bucket:executing >>> >>> 2019-05-03 10:40:41.026 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:41:03.026 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:41:25.026 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:41:47.026 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET >>> /[bucketname]/:list_bucket:completing >>> >>> 2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET >>> /[bucketname]/:list_bucket:op status=0 >>> >>> 2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET >>> /[bucketname]/:list_bucket:http status=200 >>> >>> 2019-05-03 10:41:49.395 7f65f63e1700 1 ====== req done >>> req=0x55eba26e8970 op status=0 http_status=200 ====== >>> >>> >>> >>> >>> >>> time s3cmd ls s3://[bucket] --no-ssl --limit 100 >>> >>> real 4m26.318s >>> >>> >>> >>> >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 1 ====== starting new request >>> req=0x55eba26e8970 ===== >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s::GET >>> /[bucketname]/::initializing for trans_id = >>> tx000000000000000000073-005ccbfefc-e6283e-default >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/::getting op 0 >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying requester >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:normalizing buckets and tenants >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:init permissions >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:recalculating target >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:reading permissions >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:init op >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying op mask >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying op permissions >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:verifying op params >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:pre-executing >>> >>> 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET >>> /[bucketname]/:list_bucket:executing >>> >>> 2019-05-03 10:42:53.026 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:43:15.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:43:37.028 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:43:59.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:44:21.028 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:44:43.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:45:05.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:45:18.260 7f660cc0e700 2 object expiration: start >>> >>> 2019-05-03 10:45:18.779 7f660cc0e700 2 object expiration: stop >>> >>> 2019-05-03 10:45:27.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:45:49.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:46:11.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:46:33.027 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:46:55.028 7f660e411700 2 >>> RGWDataChangesLog::ChangesRenewThread: start >>> >>> 2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET >>> /[bucketname]/:list_bucket:completing >>> >>> 2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET >>> /[bucketname]/:list_bucket:op status=0 >>> >>> 2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET >>> /[bucketname]/:list_bucket:http status=200 >>> >>> 2019-05-03 10:47:02.092 7f65f33db700 1 ====== req done >>> req=0x55eba26e8970 op status=0 http_status=200 ====== >>> >>> >>> >>> >>> >>> radosgw-admin bucket limit check >>> >>> } >>> >>> "bucket": "[BUCKETNAME]", >>> >>> "tenant": "", >>> >>> "num_objects": 7126133, >>> >>> "num_shards": 128, >>> >>> "objects_per_shard": 55672, >>> >>> "fill_status": "OK" >>> >>> }, >>> >>> >>> >>> >>> >>> We ‘realy don’t know who to solve that , looks like a timeout or slow performance for that bucket. >>> >>> >>> >>> Our RGW section in ceph.conf >>> >>> >>> >>> [client.rgw.ceph-rgw01] >>> >>> host = ceph-rgw01 >>> >>> rgw enable usage log = true >>> >>> rgw dns name = XXXXXX >>> >>> rgw frontends = "beast port=7480" >>> >>> rgw resolve cname = false >>> >>> rgw thread pool size = 128 >>> >>> rgw num rados handles = 1 >>> >>> rgw op thread timeout = 120 >>> >>> >>> >>> >>> >>> [client.rgw.ceph-rgw03] >>> >>> host = ceph-rgw03 >>> >>> rgw enable usage log = true >>> >>> rgw dns name = XXXXXXXX >>> >>> rgw frontends = "beast port=7480" >>> >>> rgw resolve cname = false >>> >>> rgw thread pool size = 640 >>> >>> rgw num rados handles = 16 >>> >>> rgw op thread timeout = 120 >>> >>> >>> >>> >>> >>> Best Regards, >>> >>> >>> >>> Manuel >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> -- >> >> Matt Benjamin >> Red Hat, Inc. >> 315 West Huron Street, Suite 140A >> Ann Arbor, Michigan 48103 >> >> http://www.redhat.com/en/technologies/storage >> >> tel. 734-821-5101 >> fax. 734-769-8938 >> cel. 734-216-5309 >> > Hi, Not sure at all if this would make a difference in your case, but I recall us having some trouble with the Beast frontend performance a while back, not sure what caused it. Perhaps you could try the Civetweb frontend, just to see if that makes a difference. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com