RGW: HEAD ok but GET fails

Eugen Block <eblock@xxxxxx> · Fri, 09 Aug 2024 06:54:04 +0000

Hi,

I'm trying to help a customer with a RGW question, maybe someone here  
can help me out. Their S3 application reports errors every now and  
then, and it is complaining about missing objects. This is what the  
RGW logs:

[08/Aug/2024:08:23:47.540 +0000] "HEAD  
/hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" -  
latency=0.003999992s

[08/Aug/2024:08:23:47.552 +0000] "GET  
/hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" bytes=0-2097151  
latency=0.003999992s

So apparently, it can successfully query the HEAD, but the GET request  
shows 404. We can confirm that the queried object indeed doesn't exist  
in the data pool. But the object metadata must have been written  
successfully, apparently. Unfortunately, we don't have enough logs to  
find the corresponding PUT request, they just increased the retention  
days for logrotate to be able to inspect when it happens the next  
time. But my question is, should they see some metadata in the  
listomapkeys/listomapvals output in the index pool?
The docs [0] state this about Index Transactions:

Because the head objects are stored in different rados objects than  
the bucket indices, we can’t update both atomically with a single  
rados operation. In order to satisfy the Consistency Guarantee for  
listing operations, we have to coordinate these two object writes  
using a three-step bucket index transaction:

1. Prepare a transaction on its bucket index object.
2. Write or delete the head object.
3. Commit the transaction on the bucket index object (or cancel the  
transaction if step 2 fails).

Object writes and deletes may race with each other, so a given  
object may have more than one prepared transaction at a time. RGW  
considers an object entry to be ‘pending’ if there are any  
outstanding transactions, or ‘completed’ otherwise.

Could this be such a race condition which "just happens" from time to  
time? Or can this somehow be prevented from happening? Because right  
now the clenaup process is a bit complicated application-wise.
I'm not the most experienced RGW user, so I'd be grateful for any  
pointers here.

Thanks!
Eugen

[0] https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx