Hi, I am having some trouble with some S3 requests and I am at a loss. After upgrading to reef a couple of weeks ago some requests get stuck and never return. The two Ceph clusters are set up to sync the S3 realm bidirectionally. The bucket has 479 shards (dynamic resharding) at the moment. Putting an object (/etc/services) into the bucket via s3cmd works, and deleting it works as well. So I know it is not just the entire bucket that is somehow faulty. When I try to delete a specific prefix it the request for listing all objects never comes back. In the example below I only included the request in question which I aborted with ^C. $ s3cmd rm -r s3://sql20/pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/ -d [...snip...] DEBUG: Canonical Request: GET /sql20/ prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F host:[...snip...] x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 x-amz-date:20240306T183435Z host;x-amz-content-sha256;x-amz-date e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 ---------------------- DEBUG: signature-v4 headers: {'x-amz-date': '20240306T183435Z', 'Authorization': 'AWS4-HMAC-SHA256 Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7', 'x-amz-content-sha256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'} DEBUG: Processing request, please wait... DEBUG: get_hostname(sql20): [...snip...] DEBUG: ConnMan.get(): re-using connection: [...snip...]#1 DEBUG: format_uri(): /sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F DEBUG: Sending request method_string='GET', uri='/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F', headers={'x-amz-date': '20240306T183435Z', 'Authorization': 'AWS4-HMAC-SHA256 Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7', 'x-amz-content-sha256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'}, body=(0 bytes) ^CDEBUG: Response: {} See ya! The request did not show up normally in the logs so I set debug_rgw=20 and debug_ms=20 via ceph config set. I tried to isolate the request and looked for its request id: 13321243250692796422 The following is a grep for the request id: Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s s3:list_bucket verifying op params Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s s3:list_bucket pre-executing Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s s3:list_bucket check rate limiting Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s s3:list_bucket executing Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s s3:list_bucket list_objects_ordered: starting attempt 1 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s s3:list_bucket cls_bucket_list_ordered: request from each of 479 shard(s) for 8 entries to get 1001 total entries Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz from shard 437 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: skipping pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz[] Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz from shard 202 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: skipping pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz[] Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz from shard 420 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket manifest: total_size = 59 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: setting s->obj_tag to 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.17352021385636272460 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: got pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz from shard 444 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: skipping pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz[] Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz from shard 116 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket manifest: total_size = 1040 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: setting s->obj_tag to 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.2135820855302089522 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: got pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz from shard 452 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: skipping pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz[] Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz from shard 55 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket cls_bucket_list_ordered: skipping pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz[] Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz from shard 455 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket manifest: total_size = 4403 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket get_obj_state: setting s->obj_tag to 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.13169705319307666922 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket cls_bucket_list_ordered: got pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket cls_bucket_list_ordered: currently processing pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600044.gz from shard 407 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600044.gz state=0x55d4237419e8 s->prefetch_data=0 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket manifest: total_size = 6264856 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket get_obj_state: setting s->obj_tag to 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.12884446303569321109 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket bucket index object: rechenzentrum.rgw.buckets.index:.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3.1.34 Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket cache get: name=rechenzentrum.rgw.log++bucket.sync-source-hints.sql20 : hit (negative entry) Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket cache get: name=rechenzentrum.rgw.log++bucket.sync-target-hints.sql20 : hit (negative entry) Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket reflect(): flow manager (bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3): adding source pipe: {s={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0},d={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0}} Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket reflect(): flow manager (bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3): adding dest pipe: {s={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0},d={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0}} Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket reflect(): flow manager (bucket=): adding source pipe: {s={b=*,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0},d={b=*,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0}} Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket reflect(): flow manager (bucket=): adding dest pipe: {s={b=*,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0},d={b=*,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0}} Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket chain_cache_entry: cache_locator= Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket chain_cache_entry: couldn't find cache locator Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket couldn't put bucket_sync_policy cache entry, might have raced with data changes Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s s3:list_bucket RGWDataChangesLog::add_entry() bucket.name=sql20 shard_id=34 now=2024-03-06T18:36:17.978389+0000 cur_expiration=1970-01-01T00:00:00.000000+0000 I don't see any clear error but somehow the last view lines are odd to me: - When before it said: flow manager (bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3) it has no more bucket: flow manager (bucket=) - no cache locator found. No idea if this is okay or not - The cur_expiration a few lines later is set to unix time 0 (1970-01-01T00:00:00.000000+0000) - I did this multiple times and it seems to always be shard 34 that has the issue Did someone see something like this before? Any ideas how to remedy the situation or at least where to or what to look for? Best, Christian _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx