We see the same symptoms. We have an S3 object for which HEAD requests return 200, but GET requests return 404. We searched the S3 gateway logs for write requests that happened on the object before GET requests started failing. Strangely, we see succeeding GET requests and then failing GET requests without any write request (such as DELETE) being logged in between. So, we're unsure whether this is related to the known bug in DELETE requests mentioned earlier in this thread (https://tracker.ceph.com/issues/63935). Excerpts from the S3 gateway logs for the object in question: ``` ... # Last upload of the file 2025-01-16 08:09:23.112 debug 2025-01-16T08:09:23.111+0000 7f4b739f7640 1 beast: 0x7f4adc1094a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.105 +0000] "POST /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?uploads=&x-id=CreateMultipartUpload HTTP/1.1" 200 359 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.006000043s 2025-01-16 08:09:23.457 debug 2025-01-16T08:09:23.456+0000 7f137c405640 1 beast: 0x7f13049b64a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.168 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=4&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.288000762s 2025-01-16 08:09:23.460 debug 2025-01-16T08:09:23.460+0000 7fc386dfa640 1 beast: 0x7fc2fef834a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.161 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=2&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.298000038s 2025-01-16 08:09:23.461 debug 2025-01-16T08:09:23.460+0000 7f4b81212640 1 beast: 0x7f4adc1094a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.158 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=5&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.302002162s 2025-01-16 08:09:23.461 debug 2025-01-16T08:09:23.460+0000 7f4b629d5640 1 beast: 0x7f4adab5e4a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.162 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=3&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.298002154s 2025-01-16 08:09:23.461 debug 2025-01-16T08:09:23.460+0000 7f4ba5a5b640 1 beast: 0x7f4adaa5c4a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.162 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=1&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart HTTP/1.1" 200 5242880 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.298002154s 2025-01-16 08:09:23.520 debug 2025-01-16T08:09:23.519+0000 7f4bd8ac1640 1 beast: 0x7f4adab5e4a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.465 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?partNumber=6&uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=UploadPart HTTP/1.1" 200 4225533 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.053000379s 2025-01-16 08:09:23.542 debug 2025-01-16T08:09:23.541+0000 7f4ba3a57640 1 beast: 0x7f4ada9db4a0: 10.70.3.26 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:08:09:23.520 +0000] "POST /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?uploadId=2~POfVWPePjR4HDk48JZYll6Vx8izyqni&x-id=CompleteMultipartUpload HTTP/1.1" 200 548 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1 ft/s3-transfer" - latency=0.021000152s ... # GET requests return 200 2025-01-16 17:27:57.574 debug 2025-01-16T17:27:57.573+0000 7f138d427640 1 beast: 0x7f13048334a0: 10.70.3.93 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [16/Jan/2025:17:27:57.498 +0000] "GET /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=GetObject HTTP/1.1" 200 30439933 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.075000212s ... # Last write request we could find in the logs (a CopyObject request) 2025-01-17 08:11:23.721 debug 2025-01-17T08:11:23.719+0000 7fc394e16640 1 ====== starting new request req=0x7fc300cbd4a0 ===== 2025-01-17 08:11:23.721 debug 2025-01-17T08:11:23.720+0000 7fc3da6a1640 1 ====== req done req=0x7fc3009b74a0 op status=0 http_status=200 latency=0.007000001s ====== 2025-01-17 08:11:23.721 debug 2025-01-17T08:11:23.720+0000 7fc3da6a1640 1 beast: 0x7fc3009b74a0: 10.70.3.153 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:08:11:23.713 +0000] "PUT /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=CopyObject HTTP/1.1" 200 252 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.007000001s ... # Last GET request returning 200 2025-01-17 09:34:50.256 debug 2025-01-17T09:34:50.255+0000 7fc420f2e640 1 beast: 0x7fc3009b74a0: 10.70.3.91 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:09:34:50.151 +0000] "GET /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=GetObject HTTP/1.1" 200 30439933 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.104000047s 2025-01-17 09:36:47.138 debug 2025-01-17T09:36:47.137+0000 7f13da4c1640 1 beast: 0x7f13061e64a0: 10.70.3.91 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:09:36:47.136 +0000] "HEAD /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000003s 2025-01-17 10:01:57.580 debug 2025-01-17T10:01:57.579+0000 7f4bf32f6640 1 beast: 0x7f4adace14a0: 10.70.3.93 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:10:01:57.578 +0000] "HEAD /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000007s 2025-01-17 10:02:16.805 debug 2025-01-17T10:02:16.804+0000 7f4b39983640 1 beast: 0x7f4ada4504a0: 10.70.3.28 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:10:02:16.803 +0000] "HEAD /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000008s 2025-01-17 10:23:35.706 debug 2025-01-17T10:23:35.705+0000 7f142f56b640 1 beast: 0x7f13058534a0: 10.70.3.91 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:10:23:35.704 +0000] "HEAD /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000003s 2025-01-17 10:25:38.406 debug 2025-01-17T10:25:38.405+0000 7f4b719f3640 1 beast: 0x7f4adbafd4a0: 10.70.3.28 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:10:25:38.404 +0000] "HEAD /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000007s # Now GET requests return 404s. Note there are only HEAD requests visible in the logs since the last GET request that returned a 200. 2025-01-17 10:31:56.619 debug 2025-01-17T10:31:56.617+0000 7fc37e5e9640 0 req 17300774944515556197 0.001000000s s3:get_obj iterate_obj() failed with -2 2025-01-17 10:31:56.619 debug 2025-01-17T10:31:56.618+0000 7fc35058d640 1 ====== req done req=0x7fc3011464a0 op status=-2 http_status=404 latency=0.002000001s ====== 2025-01-17 10:31:56.619 debug 2025-01-17T10:31:56.618+0000 7fc35058d640 1 beast: 0x7fc3011464a0: 10.70.3.93 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:10:31:56.615 +0000] "GET /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059?x-id=GetObject HTTP/1.1" 404 294 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.002000001s # HEAD requets still return 200. 2025-01-17 10:32:48.196 debug 2025-01-17T10:32:48.195+0000 7fc3a2e32640 1 beast: 0x7fc2feb7b4a0: 10.70.3.91 - obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1 [17/Jan/2025:10:32:48.194 +0000] "HEAD /buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4/blobs/sha256%3A7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 HTTP/1.1" 200 0 - "aws-sdk-go-v2/1.24.1 os/linux lang/go#1.23.4 md/GOOS#linux md/GOARCH#amd64 api/s3#1.48.1" - latency=0.001000001s ``` Here's the bucket index entry. The mtime matches the time of the last CopyObject request: ``` bash-5.1$ radosgw-admin bi list --bucket=buildkit-cache-466cec97-dd8f-467d-b4bd-73df82db10c4 ... { "type": "plain", "idx": "blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059", "entry": { "name": "blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059", "instance": "", "ver": { "pool": 20, "epoch": 0 }, "locator": "", "exists": true, "meta": { "category": 1, "size": 30439933, "mtime": "2025-01-17T08:11:23.715427Z", "etag": "ea3dc44b4f3e9fa8874c0022664edeaa-6", "storage_class": "", "owner": "obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1", "owner_display_name": "obc-gitlab-runner-k8s-buildkit-cache-bucket-90b2b9dd-deab-4519-8802-082c9bc658b1", "content_type": "", "accounted_size": 30439933, "user_data": "", "appendable": false }, "tag": "26caa730-b29f-4262-ac51-6a47db5585d2.75677055.5371543061660465468", "flags": 0, "pending_map": [], "versioned_epoch": 0 } }, ``` There is a RADOS object with the relevant name, matching mtime, but with size 0: ``` bash-5.1$ rados -p general-s3.rgw.buckets.data ls | grep 7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 26caa730-b29f-4262-ac51-6a47db5585d2.75671339.1_blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 bash-5.1$ rados -p general-s3.rgw.buckets.data stat 26caa730-b29f-4262-ac51-6a47db5585d2.75671339.1_blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 general-s3.rgw.buckets.data/26caa730-b29f-4262-ac51-6a47db5585d2.75671339.1_blobs/sha256:7478e0ac0f23f94b2f27848fbcdf804a670fbf8d4bab26df842d40a10cd33059 mtime 2025-01-17T08:11:23.000000+0000, size 0 ``` We run Ceph 19.2.0 on K8s, orchestrated with Rook. We have multiple replicas for the S3 gateway. The problems started to appear after updating from Reef to Squid. So far, we only noticed the problem for a specific bucket used as a BuildKit cache. We deleted and recreated the bucket for BuildKit, and the problem appeared again after running our BuildKit jobs against the new bucket for a few days. At first, we thought the problem was caused by a bug/unwanted behavior in the CopyObject implementation. It looks like CopyObject fails to update the index but nevertheless updates the RADOS objects holding the S3 object, replacing (multiple parts of?) a large S3 object with a single object of size 0. However, the fact that we see a successful GET request ten minutes after the last CopyObject request casts doubt on the hypothesis that CopyObject is the problem. We'd be very grateful for any advice on how to troubleshoot this further. Simon _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx