Re: RGW: HEAD ok but GET fails

Mathias Chapelain <mathias.chapelain@xxxxxxxxx> · Fri, 09 Aug 2024 07:52:50 +0000

Hello,

Did the customer deleted the object by any chance? If yes, could this be related to https://tracker.ceph.com/issues/63935 ?
We got a scenario where an application was doing some DELETE and then listing bucket entries.
It was able to find objects that should have been deleted and then was trying to GET them without success.

Regards,

Mathias Chapelain
Storage Engineer
Proton AG

On Friday, August 9th, 2024 at 08:54, Eugen Block <eblock@xxxxxx> wrote:

> Hi,
> 
> I'm trying to help a customer with a RGW question, maybe someone here
> can help me out. Their S3 application reports errors every now and
> then, and it is complaining about missing objects. This is what the
> RGW logs:
> 
> [08/Aug/2024:08:23:47.540 +0000] "HEAD
> /hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 200 0 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" -
> latency=0.003999992s
> 
> [08/Aug/2024:08:23:47.552 +0000] "GET
> /hchsarchiv/20240622221326-20540623-aeaa962adadf5bc92050823dd03039197987f9d16f70c793599e361b6a5910c83941a0ceb3c7bfccb0a8ecbae212c701958d8a316b4fb172a54040b26b3a2508 HTTP/1.1" 404 242 - "aws-sdk-dotnet-45/3.5.9.7 aws-sdk-dotnet-core/3.5.3.7 .NET_Runtime/4.0 .NET_Framework/4.0 OS/Microsoft_Windows_NT_10.0.14393.0 ClientSync" bytes=0-2097151
> latency=0.003999992s
> 
> So apparently, it can successfully query the HEAD, but the GET request
> shows 404. We can confirm that the queried object indeed doesn't exist
> in the data pool. But the object metadata must have been written
> successfully, apparently. Unfortunately, we don't have enough logs to
> find the corresponding PUT request, they just increased the retention
> days for logrotate to be able to inspect when it happens the next
> time. But my question is, should they see some metadata in the
> listomapkeys/listomapvals output in the index pool?
> The docs [0] state this about Index Transactions:
> 
> > Because the head objects are stored in different rados objects than
> > the bucket indices, we can’t update both atomically with a single
> > rados operation. In order to satisfy the Consistency Guarantee for
> > listing operations, we have to coordinate these two object writes
> > using a three-step bucket index transaction:
> > 
> > 1. Prepare a transaction on its bucket index object.
> > 2. Write or delete the head object.
> > 3. Commit the transaction on the bucket index object (or cancel the
> > transaction if step 2 fails).
> > 
> > Object writes and deletes may race with each other, so a given
> > object may have more than one prepared transaction at a time. RGW
> > considers an object entry to be ‘pending’ if there are any
> > outstanding transactions, or ‘completed’ otherwise.
> 
> 
> Could this be such a race condition which "just happens" from time to
> time? Or can this somehow be prevented from happening? Because right
> now the clenaup process is a bit complicated application-wise.
> I'm not the most experienced RGW user, so I'd be grateful for any
> pointers here.
> 
> Thanks!
> Eugen
> 
> [0] https://docs.ceph.com/en/reef/dev/radosgw/bucket_index/#index-transaction
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx