Hanging request in S3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I am having some trouble with some S3 requests and I am at a loss.

After upgrading to reef a couple of weeks ago some requests get stuck and
never
return. The two Ceph clusters are set up to sync the S3 realm
bidirectionally.
The bucket has 479 shards (dynamic resharding) at the moment.

Putting an object (/etc/services) into the bucket via s3cmd works, and
deleting
it works as well. So I know it is not just the entire bucket that is somehow
faulty.

When I try to delete a specific prefix it the request for listing all
objects
never comes back. In the example below I only included the request in
question
which I aborted with ^C.

$ s3cmd rm -r
s3://sql20/pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/ -d
[...snip...]
DEBUG: Canonical Request:
GET
/sql20/
prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
host:[...snip...]
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20240306T183435Z

host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
----------------------
DEBUG: signature-v4 headers: {'x-amz-date': '20240306T183435Z',
'Authorization': 'AWS4-HMAC-SHA256
Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
'x-amz-content-sha256':
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'}
DEBUG: Processing request, please wait...
DEBUG: get_hostname(sql20): [...snip...]
DEBUG: ConnMan.get(): re-using connection: [...snip...]#1
DEBUG: format_uri():
/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F
DEBUG: Sending request method_string='GET',
uri='/sql20/?prefix=pgbackrest%2Fbackup%2Fadrpb%2F20240130-200410F%2Fpg_data%2Fbase%2F16560%2F',
headers={'x-amz-date': '20240306T183435Z', 'Authorization':
'AWS4-HMAC-SHA256
Credential=VL0FRB7CYGMHBGCD419M/20240306/[...snip...]/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=45b133675535ab611bbf2b9a7a6e40f9f510c0774bf155091dc9a05b76856cb7',
'x-amz-content-sha256':
'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'},
body=(0 bytes)
^CDEBUG: Response:
{}
See ya!

The request did not show up normally in the logs so I set debug_rgw=20 and
debug_ms=20 via ceph config set.

I tried to isolate the request and looked for its request id:
13321243250692796422
The following is a grep for the request id:

Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s
s3:list_bucket verifying op params
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s
s3:list_bucket pre-executing
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s
s3:list_bucket check rate limiting
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s
s3:list_bucket executing
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s
s3:list_bucket list_objects_ordered: starting attempt 1
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.000000000s
s3:list_bucket cls_bucket_list_ordered: request from each of 479 shard(s)
for 8 entries to get 1001 total entries
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
from shard 437
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: skipping
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101438318.gz[]
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
from shard 202
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: skipping
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457659_fsm.gz[]
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz
from shard 420
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket manifest: total_size = 59
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: setting s->obj_tag to
107ace7a-a829-4d1c-9cb8-9db30644b786.395658.17352021385636272460
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: got
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457662_fsm.gz
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz
from shard 444
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: skipping
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101457721.gz[]
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz
from shard 116
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket manifest: total_size = 1040
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: setting s->obj_tag to
107ace7a-a829-4d1c-9cb8-9db30644b786.395658.2135820855302089522
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: got
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458889.gz
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz
from shard 452
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: skipping
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458903.gz[]
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz
from shard 55
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.332010120s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket cls_bucket_list_ordered: skipping
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101458955.gz[]
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz
from shard 455
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket manifest: total_size = 4403
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket get_obj_state: setting s->obj_tag to
107ace7a-a829-4d1c-9cb8-9db30644b786.395658.13169705319307666922
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket cls_bucket_list_ordered: got
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600036.gz
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket cls_bucket_list_ordered: currently processing
pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600044.gz
from shard 407
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket get_obj_state: rctx=0x7f74bdc6f860
obj=sql20:pgbackrest/backup/adrpb/20240130-200410F/pg_data/base/16560/101600044.gz
state=0x55d4237419e8 s->prefetch_data=0
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket manifest: total_size = 6264856
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket get_obj_state: setting s->obj_tag to
107ace7a-a829-4d1c-9cb8-9db30644b786.395658.12884446303569321109
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket  bucket index object:
rechenzentrum.rgw.buckets.index:.dir.3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3.1.34
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket cache get:
name=rechenzentrum.rgw.log++bucket.sync-source-hints.sql20 : hit (negative
entry)
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket cache get:
name=rechenzentrum.rgw.log++bucket.sync-target-hints.sql20 : hit (negative
entry)
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket reflect(): flow manager
(bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3): adding
source pipe:
{s={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0},d={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0}}
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket reflect(): flow manager
(bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3): adding dest
pipe:
{s={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0},d={b=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0}}
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket reflect(): flow manager (bucket=): adding source pipe:
{s={b=*,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0},d={b=*,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0}}
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket reflect(): flow manager (bucket=): adding dest pipe:
{s={b=*,z=107ace7a-a829-4d1c-9cb8-9db30644b786,az=0},d={b=*,z=3caabb9a-4e3b-4b8a-8222-34c33dd63210,az=0}}
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket chain_cache_entry: cache_locator=
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket chain_cache_entry: couldn't find cache locator
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket couldn't put bucket_sync_policy cache entry, might have
raced with data changes
Mär 06 19:36:17 radosgw[8318]: req 13321243250692796422 0.336010247s
s3:list_bucket RGWDataChangesLog::add_entry() bucket.name=sql20 shard_id=34
now=2024-03-06T18:36:17.978389+0000
cur_expiration=1970-01-01T00:00:00.000000+0000

I don't see any clear error but somehow the last view lines are odd to me:
- When before it said: flow manager
(bucket=sql20:3caabb9a-4e3b-4b8a-8222-34c33dd63210.10724501.3)
  it has no more bucket: flow manager (bucket=)
- no cache locator found. No idea if this is okay or not
- The cur_expiration a few lines later is set to unix time 0
  (1970-01-01T00:00:00.000000+0000)
- I did this multiple times and it seems to always be shard 34 that has the
issue

Did someone see something like this before?
Any ideas how to remedy the situation or at least where to or what to look
for?

Best,
Christian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux