hey Christian, i'm guessing this relates to https://tracker.ceph.com/issues/63373 which tracks a deadlock in s3 DeleteObjects requests when multisite is enabled. rgw_multi_obj_del_max_aio can be set to 1 as a workaround until the reef backport lands On Wed, Mar 6, 2024 at 2:41 PM Christian Kugler <syphdias+ceph@xxxxxxxxx> wrote: > > Hi, > > I am having some trouble with some S3 requests and I am at a loss. > > After upgrading to reef a couple of weeks ago some requests get stuck and > never > return. The two Ceph clusters are set up to sync the S3 realm > bidirectionally. > The bucket has 479 shards (dynamic resharding) at the moment. > > Putting an object (/etc/services) into the bucket via s3cmd works, and > deleting > it works as well. So I know it is not just the entire bucket that is somehow > faulty. > > When I try to delete a specific prefix it the request for listing all > objects > never comes back. In the example below I only included the request in > question > which I aborted with ^C. > > $ s3cmd rm -r > s3://sql20/pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/ -d > [...snip...] > DEBUG: Canonical Request: > GET > /sql20/ > prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F > host:[...snip...] > x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 > x-amz-date:20240306T183435Z > > host;x-amz-content-sha256;x-amz-date > e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 > ---------------------- > DEBUG: signature-v4 headers: {'x-amz-date': '20240306T183435Z', > 'Authorization': 'AWS4-HMAC-SHA256 > Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7', > 'x-amz-content-sha256': > 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'} > DEBUG: Processing request, please wait... > DEBUG: get_hostname(sql20): [...snip...] > DEBUG: ConnMan.get(): re-using connection: [...snip...]#1 > DEBUG: format_uri(): > /sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F > DEBUG: Sending request method_string='GET', > uri='/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F', > headers={'x-amz-date': '20240306T183435Z', 'Authorization': > 'AWS4-HMAC-SHA256 > Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7', > 'x-amz-content-sha256': > 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'}, > body=(0 bytes) > ^CDEBUG: Response: > {} > See ya! > > The request did not show up normally in the logs so I set debug_rgw=20 and > debug_ms=20 via ceph config set. > > I tried to isolate the request and looked for its request id: > 13321243250692796422 > The following is a grep for the request id: > > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s > s3:list_bucket verifying op params > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s > s3:list_bucket pre-executing > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s > s3:list_bucket check rate limiting > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s > s3:list_bucket executing > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s > s3:list_bucket list_objects_ordered: starting attempt 1 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s > s3:list_bucket cls_bucket_list_ordered: request from each of 479 shard(s) > for 8 entries to get 1001 total entries > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz > from shard 437 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: skipping > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz[] > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz > from shard 202 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: skipping > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz[] > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz > from shard 420 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket manifest: total_size = 59 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: setting s->obj_tag to > 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.17352021385636272460 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: got > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz > from shard 444 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: skipping > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz[] > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz > from shard 116 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket manifest: total_size = 1040 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: setting s->obj_tag to > 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.2135820855302089522 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: got > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz > from shard 452 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: skipping > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz[] > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz > from shard 55 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket cls_bucket_list_ordered: skipping > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz[] > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz > from shard 455 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket manifest: total_size = 4403 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket get_obj_state: setting s->obj_tag to > 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.13169705319307666922 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket cls_bucket_list_ordered: got > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket cls_bucket_list_ordered: currently processing > pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600044.gz > from shard 407 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860 > obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600044.gz > state=0x55d4237419e8 s->prefetch_data=0 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket manifest: total_size = 6264856 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket get_obj_state: setting s->obj_tag to > 107ace7a-a829-4d1c-9cb8-9db30644b786.395658.12884446303569321109 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket bucket index object: > rechenzentrum.rgw.buckets.index:.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3.1.34 > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket cache get: > name=rechenzentrum.rgw.log++bucket.sync-source-hints.sql20 : hit (negative > entry) > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket cache get: > name=rechenzentrum.rgw.log++bucket.sync-target-hints.sql20 : hit (negative > entry) > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket reflect(): flow manager > (bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3): adding > source pipe: > {s={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0},d={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0}} > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket reflect(): flow manager > (bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3): adding dest > pipe: > {s={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0},d={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0}} > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket reflect(): flow manager (bucket=): adding source pipe: > {s={b=*,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0},d={b=*,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0}} > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket reflect(): flow manager (bucket=): adding dest pipe: > {s={b=*,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0},d={b=*,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0}} > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket chain_cache_entry: cache_locator= > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket chain_cache_entry: couldn't find cache locator > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket couldn't put bucket_sync_policy cache entry, might have > raced with data changes > Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s > s3:list_bucket RGWDataChangesLog::add_entry() bucket.name=sql20 shard_id=34 > now=2024-03-06T18:36:17.978389+0000 > cur_expiration=1970-01-01T00:00:00.000000+0000 > > I don't see any clear error but somehow the last view lines are odd to me: > - When before it said: flow manager > (bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3) > it has no more bucket: flow manager (bucket=) > - no cache locator found. No idea if this is okay or not > - The cur_expiration a few lines later is set to unix time 0 > (1970-01-01T00:00:00.000000+0000) > - I did this multiple times and it seems to always be shard 34 that has the > issue > > Did someone see something like this before? > Any ideas how to remedy the situation or at least where to or what to look > for? > > Best, > Christian > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx