Hi, We got a ceph deployment 13.2.5 version, but several bucket with millions of files. services: mon: 3 daemons, quorum CEPH001,CEPH002,CEPH003 mgr: CEPH001(active) osd: 106 osds: 106 up, 106 in rgw: 2 daemons active data: pools: 17 pools, 7120 pgs objects: 106.8 M objects, 271 TiB usage: 516 TiB used, 102 TiB / 619 TiB avail pgs: 7120 active+clean We done a test in a spare RGW server for this case. Customer report us that is unable to list their buckets, we tested in a monitor with the command: s3cmd ls s3://[bucket] --no-ssl --limit 20 Takes 1m and 2 secs. RGW log in debug mode = 2 2019-05-03 10:40:25.449 7f65f63e1700 1 ====== starting new request req=0x55eba26e8970 ===== 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s::GET /[bucketname]/::initializing for trans_id = tx000000000000000000071-005ccbfe79-e6283e-default 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/::getting op 0 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:verifying requester 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:normalizing buckets and tenants 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:init permissions 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:recalculating target 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:reading permissions 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:init op 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:verifying op mask 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:verifying op permissions 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:verifying op params 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:pre-executing 2019-05-03 10:40:25.449 7f65f63e1700 2 req 113:0s:s3:GET /[bucketname]/:list_bucket:executing 2019-05-03 10:40:41.026 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:41:03.026 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:41:25.026 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:41:47.026 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET /[bucketname]/:list_bucket:completing 2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET /[bucketname]/:list_bucket:op status=0 2019-05-03 10:41:49.395 7f65f63e1700 2 req 113:83.9461s:s3:GET /[bucketname]/:list_bucket:http status=200 2019-05-03 10:41:49.395 7f65f63e1700 1 ====== req done req=0x55eba26e8970 op status=0 http_status=200 ====== time s3cmd ls s3://[bucket] --no-ssl --limit 100 real 4m26.318s 2019-05-03 10:42:36.439 7f65f33db700 1 ====== starting new request req=0x55eba26e8970 ===== 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s::GET /[bucketname]/::initializing for trans_id = tx000000000000000000073-005ccbfefc-e6283e-default 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/::getting op 0 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:verifying requester 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:normalizing buckets and tenants 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:init permissions 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:recalculating target 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:reading permissions 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:init op 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:verifying op mask 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:verifying op permissions 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:verifying op params 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:pre-executing 2019-05-03 10:42:36.439 7f65f33db700 2 req 115:0s:s3:GET /[bucketname]/:list_bucket:executing 2019-05-03 10:42:53.026 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:43:15.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:43:37.028 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:43:59.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:44:21.028 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:44:43.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:45:05.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:45:18.260 7f660cc0e700 2 object expiration: start 2019-05-03 10:45:18.779 7f660cc0e700 2 object expiration: stop 2019-05-03 10:45:27.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:45:49.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:46:11.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:46:33.027 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:46:55.028 7f660e411700 2 RGWDataChangesLog::ChangesRenewThread: start 2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET /[bucketname]/:list_bucket:completing 2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET /[bucketname]/:list_bucket:op status=0 2019-05-03 10:47:02.092 7f65f33db700 2 req 115:265.652s:s3:GET /[bucketname]/:list_bucket:http status=200 2019-05-03 10:47:02.092 7f65f33db700 1 ====== req done req=0x55eba26e8970 op status=0 http_status=200 ====== radosgw-admin bucket limit check } "bucket": "[BUCKETNAME]", "tenant": "", "num_objects": 7126133, "num_shards": 128, "objects_per_shard": 55672, "fill_status": "OK" }, We ‘realy don’t know who to solve that , looks like a timeout or slow performance for that bucket. Our RGW section in ceph.conf [client.rgw.ceph-rgw01] host = ceph-rgw01 rgw enable usage log = true rgw dns name = XXXXXX rgw frontends = "beast port=7480" rgw resolve cname = false rgw thread pool size = 128 rgw num rados handles = 1 rgw op thread timeout = 120 [client.rgw.ceph-rgw03] host = ceph-rgw03 rgw enable usage log = true rgw dns name = XXXXXXXX rgw frontends = "beast port=7480" rgw resolve cname = false rgw thread pool size = 640 rgw num rados handles = 16 rgw op thread timeout = 120 Best Regards, Manuel |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com