Hi,
I'm trying to help a customer with a RGW question, maybe someone here
can help me out. Their S3 application reports errors every now and
then, and it is complaining about missing objects. This is what the
RGW logs:
[08/Aug/2024:08:23:47.540 +0000] "HEAD
/hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" -
latency=0.003999992s
[08/Aug/2024:08:23:47.552 +0000] "GET
/hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" bytes=0-2097151
latency=0.003999992s
So apparently, it can successfully query the HEAD, but the GET request
shows 404. We can confirm that the queried object indeed doesn't exist
in the data pool. But the object metadata must have been written
successfully, apparently. Unfortunately, we don't have enough logs to
find the corresponding PUT request, they just increased the retention
days for logrotate to be able to inspect when it happens the next
time. But my question is, should they see some metadata in the
listomapkeys/listomapvals output in the index pool?
The docs [0] state this about Index Transactions:
Because the head objects are stored in different rados objects than
the bucket indices, we can’t update both atomically with a single
rados operation. In order to satisfy the Consistency Guarantee for
listing operations, we have to coordinate these two object writes
using a three-step bucket index transaction:
1. Prepare a transaction on its bucket index object.
2. Write or delete the head object.
3. Commit the transaction on the bucket index object (or cancel the
transaction if step 2 fails).
Object writes and deletes may race with each other, so a given
object may have more than one prepared transaction at a time. RGW
considers an object entry to be ‘pending’ if there are any
outstanding transactions, or ‘completed’ otherwise.
Could this be such a race condition which "just happens" from time to
time? Or can this somehow be prevented from happening? Because right
now the clenaup process is a bit complicated application-wise.
I'm not the most experienced RGW user, so I'd be grateful for any
pointers here.
Thanks!
Eugen
[0] https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx