Re: RGW: HEAD ok but GET fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We see the same symptoms. We have an S3 object for which HEAD requests
return 200, but GET requests return 404. We searched the S3 gateway
logs for write requests that happened on the object before GET
requests started failing. Strangely, we see succeeding GET requests
and then failing GET requests without any write request (such as
DELETE) being logged in between. So, we're unsure whether this is
related to the known bug in DELETE requests mentioned earlier in this
thread (https://tracker.ceph.com/issues/63935).

Excerpts from the S3 gateway logs for the object in question:


```
...
# Last upload of the file
2025-01-16 08:09:23.112 debug 2025-01-16T08:09:23.111+0000
7f4b739f7640  1 beast: 0x7f4adc1094a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.105 +0000] "POST
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?uploads=&x-id=CreateMultipartUpload
HTTP/1.1" 200 359 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.006000043s
2025-01-16 08:09:23.457 debug 2025-01-16T08:09:23.456+0000
7f137c405640  1 beast: 0x7f13049b64a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.168 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=4&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart
HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.288000762s
2025-01-16 08:09:23.460 debug 2025-01-16T08:09:23.460+0000
7fc386dfa640  1 beast: 0x7fc2fef834a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.161 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=2&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart
HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.298000038s
2025-01-16 08:09:23.461 debug 2025-01-16T08:09:23.460+0000
7f4b81212640  1 beast: 0x7f4adc1094a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.158 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=5&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart
HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.302002162s
2025-01-16 08:09:23.461 debug 2025-01-16T08:09:23.460+0000
7f4b629d5640  1 beast: 0x7f4adab5e4a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.162 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=3&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart
HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.298002154s
2025-01-16 08:09:23.461 debug 2025-01-16T08:09:23.460+0000
7f4ba5a5b640  1 beast: 0x7f4adaa5c4a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.162 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=1&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart
HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.298002154s
2025-01-16 08:09:23.520 debug 2025-01-16T08:09:23.519+0000
7f4bd8ac1640  1 beast: 0x7f4adab5e4a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.465 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=6&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart
HTTP/1.1" 200 4225533 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.053000379s
2025-01-16 08:09:23.542 debug 2025-01-16T08:09:23.541+0000
7f4ba3a57640  1 beast: 0x7f4ada9db4a0: 10.70.3.26 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:08:09:23.520 +0000] "POST
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=CompleteMultipartUpload
HTTP/1.1" 200 548 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" -
latency=0.021000152s
...
# GET requests return 200
2025-01-16 17:27:57.574 debug 2025-01-16T17:27:57.573+0000
7f138d427640  1 beast: 0x7f13048334a0: 10.70.3.93 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[16/Jan/2025:17:27:57.498 +0000] "GET
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=GetObject
HTTP/1.1" 200 30439933 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.075000212s
...
# Last write request we could find in the logs (a CopyObject request)
2025-01-17 08:11:23.721 debug 2025-01-17T08:11:23.719+0000
7fc394e16640  1 ====== starting new request req=0x7fc300cbd4a0 =====
2025-01-17 08:11:23.721 debug 2025-01-17T08:11:23.720+0000
7fc3da6a1640  1 ====== req done req=0x7fc3009b74a0 op status=0
http_status=200 latency=0.007000001s ======
2025-01-17 08:11:23.721 debug 2025-01-17T08:11:23.720+0000
7fc3da6a1640  1 beast: 0x7fc3009b74a0: 10.70.3.153 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:08:11:23.713 +0000] "PUT
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=CopyObject
HTTP/1.1" 200 252 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.007000001s
...
# Last GET request returning 200
2025-01-17 09:34:50.256 debug 2025-01-17T09:34:50.255+0000
7fc420f2e640  1 beast: 0x7fc3009b74a0: 10.70.3.91 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:09:34:50.151 +0000] "GET
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=GetObject
HTTP/1.1" 200 30439933 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.104000047s
2025-01-17 09:36:47.138 debug 2025-01-17T09:36:47.137+0000
7f13da4c1640  1 beast: 0x7f13061e64a0: 10.70.3.91 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:09:36:47.136 +0000] "HEAD
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000003s
2025-01-17 10:01:57.580 debug 2025-01-17T10:01:57.579+0000
7f4bf32f6640  1 beast: 0x7f4adace14a0: 10.70.3.93 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:10:01:57.578 +0000] "HEAD
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000007s
2025-01-17 10:02:16.805 debug 2025-01-17T10:02:16.804+0000
7f4b39983640  1 beast: 0x7f4ada4504a0: 10.70.3.28 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:10:02:16.803 +0000] "HEAD
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000008s
2025-01-17 10:23:35.706 debug 2025-01-17T10:23:35.705+0000
7f142f56b640  1 beast: 0x7f13058534a0: 10.70.3.91 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:10:23:35.704 +0000] "HEAD
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000003s
2025-01-17 10:25:38.406 debug 2025-01-17T10:25:38.405+0000
7f4b719f3640  1 beast: 0x7f4adbafd4a0: 10.70.3.28 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:10:25:38.404 +0000] "HEAD
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000007s
# Now GET requests return 404s. Note there are only HEAD requests
visible in the logs since the last GET request that returned a 200.
2025-01-17 10:31:56.619 debug 2025-01-17T10:31:56.617+0000
7fc37e5e9640  0 req 17300774944515556197 0.001000000s s3:get_obj
iterate_obj() failed with -2
2025-01-17 10:31:56.619 debug 2025-01-17T10:31:56.618+0000
7fc35058d640  1 ====== req done req=0x7fc3011464a0 op status=-2
http_status=404 latency=0.002000001s ======
2025-01-17 10:31:56.619 debug 2025-01-17T10:31:56.618+0000
7fc35058d640  1 beast: 0x7fc3011464a0: 10.70.3.93 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:10:31:56.615 +0000] "GET
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=GetObject
HTTP/1.1" 404 294 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.002000001s
# HEAD requets still return 200.
2025-01-17 10:32:48.196 debug 2025-01-17T10:32:48.195+0000
7fc3a2e32640  1 beast: 0x7fc2feb7b4a0: 10.70.3.91 -
obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1
[17/Jan/2025:10:32:48.194 +0000] "HEAD
/buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4
md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000001s
```

Here's the bucket index entry. The mtime matches the time of the last
CopyObject request:

```
bash-5.1$ radosgw-admin bi list
--bucket=buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4
...
    {
        "type": "plain",
        "idx": "blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059",
        "entry": {
            "name":
"blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059",
            "instance": "",
            "ver": {
                "pool": 20,
                "epoch": 0
            },
            "locator": "",
            "exists": true,
            "meta": {
                "category": 1,
                "size": 30439933,
                "mtime": "2025-01-17T08:11:23.715427Z",
                "etag": "ea3dc44b4f3e9fa8874c0022664edeaa-6",
                "storage_class": "",
                "owner":
"obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1",
                "owner_display_name":
"obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1",
                "content_type": "",
                "accounted_size": 30439933,
                "user_data": "",
                "appendable": false
            },
            "tag":
"26caa730-b29f-4262-ac51-6a47db5585d2.75677055.5371543061660465468",
            "flags": 0,
            "pending_map": [],
            "versioned_epoch": 0
        }
    },
```

There is a RADOS object with the relevant name, matching mtime, but with size 0:

```
bash-5.1$ rados -p general-s3.rgw.buckets.data ls | grep
7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
26caa730-b29f-4262-ac51-6a47db5585d2.75671339.1_blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059

bash-5.1$ rados -p general-s3.rgw.buckets.data stat
26caa730-b29f-4262-ac51-6a47db5585d2.75671339.1_blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
general-s3.rgw.buckets.data/26caa730-b29f-4262-ac51-6a47db5585d2.75671339.1_blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059
mtime 2025-01-17T08:11:23.000000+0000, size 0
```

We run Ceph 19.2.0 on K8s, orchestrated with Rook. We have multiple
replicas for the S3 gateway. The problems started to appear after
updating from Reef to Squid. So far, we only noticed the problem for a
specific bucket used as a BuildKit cache. We deleted and recreated the
bucket for BuildKit, and the problem appeared again after running our
BuildKit jobs against the new bucket for a few days.

At first, we thought the problem was caused by a bug/unwanted behavior
in the CopyObject implementation. It looks like CopyObject fails to
update the index but nevertheless updates the RADOS objects holding
the S3 object, replacing (multiple parts of?) a large S3 object with a
single object of size 0. However, the fact that we see a successful
GET request ten minutes after the last CopyObject request casts doubt
on the  hypothesis that CopyObject is the problem.

We'd be very grateful for any advice on how to troubleshoot this further.
Simon
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux