Re: Ceph Octopus RGW - files vanished from rados while still in bucket index

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Mon, 13 Jun 2022 18:29:29 -0400

There is no known bug that would cause the rados objects underlying an RGW object to be removed without a user requesting the RGW object be deleted.

There is a known bug where the bucket index might not get updated correctly after user-requested operations. So perhaps the user removed the rgw object, but it still incorrectly shows up in the bucket index. The PR for the fix for that bug merged into the octopus branch, but after 15.2.16. See:

	https://github.com/ceph/ceph/pull/45902

So it should be in the next octopus release.

I also find it odd that a 250KB file gets a multipart object. What do we know about the original object? Do we know it’s size? Could the multipart upload never have completed? In that case there could be incomplete multipart entries in the bucket index, but they should never have been finalized into a regular bucket index entry.

Are you willing to share all the bucket index entries related to this object?

Eric
(he/him)

> On Jun 13, 2022, at 6:05 PM, Boris Behrens <bb@xxxxxxxxx> wrote:
> 
> Hi everybody,
> 
> are there other ways for rados objects to get removed, other than "rados -p
> POOL rm OBJECT"?
> We have a customer who got objects in the bucket index, but can't download
> it. After checking it seems like the rados object is gone.
> 
> Ceph cluster is running ceph octopus 15.2.16
> 
> "radosgw-admin bi list --bucket BUCKET" shows the object available.
> "radosgw-admin bucket radoslist --bucket BUCKET" shows the object and a
> corresponding multipart file.
> "rados -p POOL ls" only shows the object, but not the multipart file.
> 
> Exporting the rados object hands me an empty file.
> 
> I find it very strange that a 250KB file get a multipart object, but what
> do I know how the customer uploaded the file and how they work with the RGW
> api.
> 
> What grinds my gears is that we lost customer data, and I need to know what
> ways are there that leads to said problem.
> 
> I know there is no recovery, but I am not satisfied with "well, it just
> happened. No idea why".
> As I am the only one who is working on the the ceph cluster I would remove
> "removed via rados command" from the list of possibilities, as the last
> orphan objects cleanup was performed a month before the files last MTIME.
> 
> Is there ANY way this could happen in some correlation with the GC,
> restarting/adding/removing OSDs, sharding bucket indexes, OSD crashes and
> other? Anything that isn't "rados -p POOL rm OBJECT"?
> 
> Cheers
> Boris
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx