That's interesting, thanks for the link to the tracker issue. There's
definitely a chance that it could have been deleted (by the
application), but we don't have enough logs right now to confirm. They
don't have many insights into the application, so it can be difficult
to get to the bottom of this. I'll keep an eye on it though, because
it will most likely happen again, but it's not that frequently. So
hopefully with more logs we can debug a bit better.
Thanks!
Eugen
Zitat von Mathias Chapelain <mathias.chapelain@xxxxxxxxx>:
Hello,
Did the customer deleted the object by any chance? If yes, could
this be related to https://tracker.ceph.com/issues/63935 ?
We got a scenario where an application was doing some DELETE and
then listing bucket entries.
It was able to find objects that should have been deleted and then
was trying to GET them without success.
Regards,
Mathias Chapelain
Storage Engineer
Proton AG
On Friday, August 9th, 2024 at 08:54, Eugen Block <eblock@xxxxxx> wrote:
Hi,
I'm trying to help a customer with a RGW question, maybe someone here
can help me out. Their S3 application reports errors every now and
then, and it is complaining about missing objects. This is what the
RGW logs:
[08/Aug/2024:08:23:47.540 +0000] "HEAD
/hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync"
-
latency=0.003999992s
[08/Aug/2024:08:23:47.552 +0000] "GET
/hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync"
bytes=0-2097151
latency=0.003999992s
So apparently, it can successfully query the HEAD, but the GET request
shows 404. We can confirm that the queried object indeed doesn't exist
in the data pool. But the object metadata must have been written
successfully, apparently. Unfortunately, we don't have enough logs to
find the corresponding PUT request, they just increased the retention
days for logrotate to be able to inspect when it happens the next
time. But my question is, should they see some metadata in the
listomapkeys/listomapvals output in the index pool?
The docs [0] state this about Index Transactions:
> Because the head objects are stored in different rados objects than
> the bucket indices, we can’t update both atomically with a single
> rados operation. In order to satisfy the Consistency Guarantee for
> listing operations, we have to coordinate these two object writes
> using a three-step bucket index transaction:
>
> 1. Prepare a transaction on its bucket index object.
> 2. Write or delete the head object.
> 3. Commit the transaction on the bucket index object (or cancel the
> transaction if step 2 fails).
>
> Object writes and deletes may race with each other, so a given
> object may have more than one prepared transaction at a time. RGW
> considers an object entry to be ‘pending’ if there are any
> outstanding transactions, or ‘completed’ otherwise.
Could this be such a race condition which "just happens" from time to
time? Or can this somehow be prevented from happening? Because right
now the clenaup process is a bit complicated application-wise.
I'm not the most experienced RGW user, so I'd be grateful for any
pointers here.
Thanks!
Eugen
[0]
https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx