Re: Radosgw bucket check fix doesn't do anything

Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> · Mon, 23 Sep 2024 16:15:07 +0200 (CEST)

Hi Reid, 

You could try resharding this source bucket to 11 shards (new default since Octopus) and run the --fix command again to see how it goes. 

To stay on the safe side, you might want to plan a maintenance window and proceed as follow: 

1/ rclone sync again source bucket to destination bucket 
2/ stop all clients accessing the source bucket (or prevent them to access RGW port) 
3/ rclone sync again source bucket to destination bucket (with no activity on the bucket) 
4/ reshard and --fix the source bucket 
5/ check source bucket accessibility and usability. 
6/ start all clients (or unblock their traffic). 

This way, if you see anything wrong at step 5, you can still destroy de source bucket, recreate it, rclone all data back from destination to source and only then proceed to step 6. 

Regards, 
Frédéric. 

----- Le 20 Sep 24, à 18:52, Reid Guyett <reid.guyett@xxxxxxxxx> a écrit : 

> Hi,

> We are using 17.2.7 currently.

> FYI I tried the --fix command from a newer version and it crashes instantly.

>> podman run -it --rm -v /etc/ceph:/etc/ceph:ro [ http://quay.io/ceph/ceph:v18.2.4
>> | quay.io/ceph/ceph:v18.2.4 ] /bin/bash
>> [root@7f786047ee20 /]# radosgw-admin bucket check --check-objects --bucket
>> mimir-prod --fix
>> ...
>> -16> 2024-09-20T16:21:50.562+0000 7f7f5233f840 5 monclient: authenticate
>> success, global_id 861984733
>> -15> 2024-09-20T16:21:50.562+0000 7f7f5233f840 10 monclient: _renew_subs
>> -14> 2024-09-20T16:21:50.562+0000 7f7f5233f840 10 monclient: _send_mon_message
>> to mon.sjc858 at v2: [ http://10.65.100.20:3300/0 | 10.65.100.20:3300/0 ]
>> -13> 2024-09-20T16:21:50.562+0000 7f7f5233f840 10 monclient: _renew_subs
>> -12> 2024-09-20T16:21:50.562+0000 7f7f5233f840 10 monclient: _send_mon_message
>> to mon.sjc858 at v2: [ http://10.65.100.20:3300/0 | 10.65.100.20:3300/0 ]
>> -11> 2024-09-20T16:21:50.562+0000 7f7f5233f840 1 librados: init done
>> -10> 2024-09-20T16:21:50.562+0000 7f7f5233f840 5 asok(0x55a8bdbe8000)
>> register_command cr dump hook 0x55a8bd9bd4b8
>> -9> 2024-09-20T16:21:50.569+0000 7f7f362b1640 4 mgrc handle_mgr_map Got map
>> version 219
>> -8> 2024-09-20T16:21:50.569+0000 7f7f362b1640 4 mgrc handle_mgr_map Active mgr
>> is now [v2: [ http://10.65.100.12:6896/1988445,v1:10.65.100.12:6897/1988445 |
>> 10.65.100.12:6896/1988445,v1:10.65.100.12:6897/1988445 ] ]
>> -7> 2024-09-20T16:21:50.569+0000 7f7f362b1640 4 mgrc reconnect Starting new
>> session with [v2: [
>> http://10.65.100.12:6896/1988445,v1:10.65.100.12:6897/1988445 |
>> 10.65.100.12:6896/1988445,v1:10.65.100.12:6897/1988445 ] ]
>> -6> 2024-09-20T16:21:50.569+0000 7f7f512e7640 10 monclient: get_auth_request con
>> 0x55a8be1fe000 auth_method 0
>> -5> 2024-09-20T16:21:50.571+0000 7f7f5233f840 5 note: GC not initialized
>> -4> 2024-09-20T16:21:50.571+0000 7f7f5233f840 5 asok(0x55a8bdbe8000)
>> register_command sync trace show hook 0x55a8bd92c7e0
>> -3> 2024-09-20T16:21:50.571+0000 7f7f5233f840 5 asok(0x55a8bdbe8000)
>> register_command sync trace history hook 0x55a8bd92c7e0
>> -2> 2024-09-20T16:21:50.571+0000 7f7f5233f840 5 asok(0x55a8bdbe8000)
>> register_command sync trace active hook 0x55a8bd92c7e0
>> -1> 2024-09-20T16:21:50.572+0000 7f7f5233f840 5 asok(0x55a8bdbe8000)
>> register_command sync trace active_short hook 0x55a8bd92c7e0
>> 0> 2024-09-20T16:21:50.835+0000 7f7f5233f840 -1 *** Caught signal (Floating
>> point exception) **
>> in thread 7f7f5233f840 thread_name:radosgw-admin

> I see [ https://tracker.ceph.com/issues/58330 |
> https://tracker.ceph.com/issues/58330 ] which mentions this error on a bucket
> with 0 shards. This bucket does have 0 shards when I check the stats.

> TBH I'm pretty sure there are tons and tons of leftover rados objects in our
> cluster. The radosgw service has crashed so many times since the inception of
> this cluster (50x a day for months).

> I will take a look at cleaning up the entries manually but it would be nice if
> the admin tool did some of it. Even though your steps should be pretty error
> resistant since there isn't much typing involved, deleting objects at the rados
> level is a bit scary for me.

> Thanks!

> On Fri, Sep 20, 2024 at 4:44 AM Frédéric Nass < [
> mailto:frederic.nass@xxxxxxxxxxxxxxxx | frederic.nass@xxxxxxxxxxxxxxxx ] >
> wrote:

>> Hi Reid,

>> Only the metadata / index side. "invalid_multipart_entries" relates to multipart
>> index entries that don't have a corresponding .meta index entry anymore, the
>> entry listing all parts of a multipart upload.
>> The --fix should have removed these multipart index entries from the bucket
>> index and updated the header object with new calculated stats [1], but
>> obviously it failed at doing so.

>> You may be facing this bug [2]. If that's the case, then upgrading your cluster
>> and running the tool again may help. Which version of Ceph is this, btw?

>> In the past, we've been using the following procedure to manually clean up the
>> bucket index from orphaned multipart entries that we couldn't remove because
>> the rados data objects (multipart parts) were missing:

>> # Generate list of multipart objects
>> aws s3api list-multipart-uploads --endpoint= [
>> https://s3.peta.univ-lorraine.fr:9443/ | https://s3.peta.univ-lorraine.fr:9443
>> ] --bucket $bucket_name > list-multipart-uploads.txt

>> # Get bucket ID
>> bucket_id=$(radosgw-admin bucket stats --bucket=$bucket_name | grep '"id"' | cut
>> -d '"' -f 4)

>> # List all shards
>> rados -p $index_pool_name ls | grep "$bucket_id" | sort -n -t '.' -k6

>> # Get all shards with their list of omap keys
>> mkdir "$bucket_id"
>> for i in $(rados -p $index_pool_name ls | grep "$bucket_id"); do echo $i ; rados
>> -p $index_pool_name listomapkeys $i > "${bucket_id}/${i}" ; done

>> # Get all UploadId
>> grep '"UploadId"' list-multipart-uploads.txt | cut -d '"' -f 4 > UploadIds.txt

>> # Identify which UploadId belongs to which shard(s)
>> fgrep -f UploadIds.txt ${bucket_id}/.dir.${bucket_id}* | sed -e
>> "s/^${bucket_id}\///g" > UploadId-to-shard.txt

>> # Cleanup these entries
>> while IFS=':' read -r object key ; do echo "Removing Key ${key}" ; rados -p
>> ${index_pool_name} rmomapkey "${object}" "${key}" ; done <
>> UploadId-to-shard.txt > rmomapkey.log

>> The difference with your case is that we could list them with 'aws s3api
>> list-multipart-uploads', but maybe you can identify the ompakeys to remove
>> based on the 'invalid_multipart_entries' list.

>> Besides, after cleaning up the index, you may want to run the rgw-orphan-list
>> [3] to identify eventual orphaned multipart objects left in the data pool and
>> remove them with a 'rados rm' comand.

>> Good luck,
>> Frédéric.

>> [1] [
>> https://www.ibm.com/docs/en/storage-ceph/7?topic=management-managing-bucket-index-entries
>> |
>> https://www.ibm.com/docs/en/storage-ceph/7?topic=management-managing-bucket-index-entries
>> ]
>> [2] [ https://tracker.ceph.com/issues/53874 |
>> https://tracker.ceph.com/issues/53874 ]
>> [3] [ https://access.redhat.com/solutions/4544621 |
>> https://access.redhat.com/solutions/4544621 ]

>> ----- Le 19 Sep 24, à 16:34, Reid Guyett < [ mailto:reid.guyett@xxxxxxxxx |
>> reid.guyett@xxxxxxxxx ] > a écrit :

>>> Hi,

>>> I didn't notice any changes in the counts after running the check --fix | check
>>> --check-objects --fix. Also the bucket isn't versioned.

>>> I will take a look at the index vs the radoslist. Which side would cause the
>>> 'invalid_multipart_entries"?

>>> Thanks

>>> On Thu, Sep 19, 2024 at 5:50 AM Frédéric Nass < [
>>> mailto:frederic.nass@xxxxxxxxxxxxxxxx | frederic.nass@xxxxxxxxxxxxxxxx ] >
>>> wrote:

>>>> Oh, by the way, since 35470 is near two times 18k, couldn't it be that the
>>>> source bucket is versioned and the destination bucket only got the most recent
>>>> copy of each object?

>>>> Regards,
>>>> Frédéric.

>>>> ----- Le 18 Sep 24, à 20:39, Reid Guyett < [ mailto:reid.guyett@xxxxxxxxx |
>>>> reid.guyett@xxxxxxxxx ] > a écrit :

>>>>> Hi Frederic,
>>>>> Thanks for those notes.

>>>>> When I scan the list of multiparts, I do not see the items from the invalid
>>>>> multipart list. Example:
>>>>> The first entry here

>>>>>> radosgw-admin bucket check --bucket mimir-prod | head
>>>>>> {
>>>>>> "invalid_multipart_entries": [
>>>>>> "_multipart_network/01H9CFRA45MJWBHQRCHRR4JHV4/index.sJRTCoqiZvlge2cjz6gLU7DwuLI468zo.2",
>>>>>> ...

>>>>> does not appear in the abort-multipart-upload.txt I get from the
>>>>> list-multipart-uploads

>>>>>> $ grep -c 01H9CFRA45MJWBHQRCHRR4JHV4 abort-multipart-upload.txt
>>>>>> 0

>>>>> If I try to abort the invalid multipart, it says it does not exist.

>>>>>> $ aws --profile mimir-prod --endpoint-url [ https://my.objectstorage.domain/ |
>>>>>> https://my.objectstorage.domain ] s3api abort-multipart-upload --bucket
>>>>>> mimir-prod --key "network/01H9CFRA45MJWBHQRCHRR4JHV4/index" --upload-id
>>>>>> "sJRTCoqiZvlge2cjz6gLU7DwuLI468zo.2"

>>>>>> An error occurred (NoSuchUpload) when calling the AbortMultipartUpload
>>>>>> operation: Unknown
>>>>> I seem to have many buckets with this type of state. I'm hoping to be able to
>>>>> fix them.

>>>>> Thanks!

>>>>> On Wed, Sep 18, 2024 at 4:21 AM Frédéric Nass < [
>>>>> mailto:frederic.nass@xxxxxxxxxxxxxxxx | frederic.nass@xxxxxxxxxxxxxxxx ] >
>>>>> wrote:

>>>>>> Hi Reid,

>>>>>> The bucket check --fix will not clean up aborted multipart uploads. An S3 client
>>>>>> will.

>>>>>> You need to either set a Lifecycle policy on buckets to have these cleaned up
>>>>>> automatically after some time

>>>>>> ~/ cat /home/lifecycle.xml
>>>>>> <LifecycleConfiguration>
>>>>>> <Rule>
>>>>>> <AbortIncompleteMultipartUpload>
>>>>>> <DaysAfterInitiation>3</DaysAfterInitiation>
>>>>>> </AbortIncompleteMultipartUpload>
>>>>>> <Prefix></Prefix>
>>>>>> <Status>Enabled</Status>
>>>>>> </Rule>
>>>>>> </LifecycleConfiguration>

>>>>>> ~/ s3cmd setlifecycle lifecycle.xml s3://bucket-test

>>>>>> Or get rid of them manually by using an s3 client

>>>>>> ~/ aws s3api list-multipart-uploads --endpoint= [
>>>>>> https://my.objectstorage.domain/ | https://my.objectstorage.domain ] --bucket
>>>>>> mimir-prod | jq -r '.Uploads[] | "--key \"\(.Key)\" --upload-id \(.UploadId)"'
>>>>>> > abort-multipart-upload.txt

>>>>>> ~/ max=$(cat abort-multipart-upload.txt | wc -l); i=1; while read -r line; do
>>>>>> echo -n "$i/$max"; ((i=i+1)); eval "aws s3api abort-multipart-upload
>>>>>> --endpoint= [ https://my.objectstorage.domain/ |
>>>>>> https://my.objectstorage.domain ] --bucket mimir-prod $line"; done <
>>>>>> abort-multipart-upload.txt

>>>>>> Regards,
>>>>>> Frédéric.

>>>>>> ----- Le 17 Sep 24, à 14:27, Reid Guyett [ mailto:reid.guyett@xxxxxxxxx |
>>>>>> reid.guyett@xxxxxxxxx ] a écrit :

>>>>>> > Hello,

>>>>>> > I recently moved a bucket from 1 cluster to another cluster using rclone. I
>>>>>> > noticed that the source bucket had around 35k objects and the destination
>>>>>> > bucket only had around 18k objects after the sync was completed.

>>>>>> > Source bucket stats showed:

>>>>>> >> radosgw-admin bucket stats --bucket mimir-prod | jq .usage
>>>>>> >> {
>>>>>> >> "rgw.main": {
>>>>>> >> "size": 4321515978174,
>>>>>> >> "size_actual": 4321552605184,
>>>>>> >> "size_utilized": 4321515978174,
>>>>>> >> "size_kb": 4220230448,
>>>>>> >> "size_kb_actual": 4220266216,
>>>>>> >> "size_kb_utilized": 4220230448,
>>>>>> >> "num_objects": 35470
>>>>>> >> },
>>>>>> >> "rgw.multimeta": {
>>>>>> >> "size": 0,
>>>>>> >> "size_actual": 0,
>>>>>> >> "size_utilized": 66609,
>>>>>> >> "size_kb": 0,
>>>>>> >> "size_kb_actual": 0,
>>>>>> >> "size_kb_utilized": 66,
>>>>>> >> "num_objects": 2467
>>>>>> >> }
>>>>>> >> }

>>>>>> > Destination bucket stats showed:

>>>>>> >> radosgw-admin bucket stats --bucket mimir-prod | jq .usage
>>>>>> >> {
>>>>>> >> "rgw.main": {
>>>>>> >> "size": 4068176326491,
>>>>>> >> "size_actual": 4068212576256,
>>>>>> >> "size_utilized": 4068176326491,
>>>>>> >> "size_kb": 3972828444,
>>>>>> >> "size_kb_actual": 3972863844,
>>>>>> >> "size_kb_utilized": 3972828444,
>>>>>> >> "num_objects": 18525
>>>>>> >> },
>>>>>> >> "rgw.multimeta": {
>>>>>> >> "size": 0,
>>>>>> >> "size_actual": 0,
>>>>>> >> "size_utilized": 108,
>>>>>> >> "size_kb": 0,
>>>>>> >> "size_kb_actual": 0,
>>>>>> >> "size_kb_utilized": 1,
>>>>>> >> "num_objects": 4
>>>>>> >> }
>>>>>> >> }

>>>>>> > When I checked the source bucket using aws cli tool it showed around 18k
>>>>>> > objects. The bucket was actively being used so the 18k is slightly
>>>>>> > different.

>>>>>>>> aws --profile mimir-prod --endpoint-url [ https://my.objectstorage.domain/ |
>>>>>> >> https://my.objectstorage.domain ]
>>>>>> >> s3api list-objects --bucket mimir-prod > mimir_objs
>>>>>> >> cat mimir_objs | grep -c "Key"
>>>>>> >> 18090

>>>>>> > I did a check on the source bucket and it showed a lot of invalid
>>>>>> > multipart objects.

>>>>>> >> radosgw-admin bucket check --bucket mimir-prod | head
>>>>>> >> {
>>>>>> >> "invalid_multipart_entries": [

>>>>>> >> "_multipart_network/01H9CFRA45MJWBHQRCHRR4JHV4/index.sJRTCoqiZvlge2cjz6gLU7DwuLI468zo.2",

>>>>>> >> "_multipart_network/01HMCCRMTC5F4BFCZ56BKHTMWQ/index.6ypGbeMr6Jg3y7xAL8yrLL-v4sbFzjSA.3",

>>>>>> >> "_multipart_network/01HMFKR56RRZNX9VT9B4F49MMD/chunks/000001.JIC7fFA_q96nal1yGXsVSPCY8EMe5AU8.2",

>>>>>> >> "_multipart_network/01HMFKSND2E5BWF6QVTX8SDRRQ/index.57aSNeXn3j70H4EHfbNCD2RpoOp-P1Bv.2",

>>>>>> >> "_multipart_network/01HMFKTDNA3FVSWW7N8KYY2C7N/chunks/000001.2~kRjRbLWWDf1e40P40LUzdU3f_x2P46Q.2",

>>>>>> >> "_multipart_network/01HMFTMA8J1DEXYHKMVCXCC0GM/chunks/000001.GVajdCja0gHOLlgyFanF72A4B6ZqUpu5.2",

>>>>>> >> "_multipart_network/01HMFTMA8J1DEXYHKMVCXCC0GM/chunks/000001.GYaouEePvEdbQosCb5jLFCAHrSm9VoDh.2",

>>>>>> >> "_multipart_network/01HMFTMA8J1DEXYHKMVCXCC0GM/chunks/000001.r4HkP-JK-rBAWDoXBXKJJYEAjk39AswW.1",
>>>>>> >> ...

>>>>>> > So I tried to run `radosgw-admin bucket check --check-objects --bucket
>>>>>> > mimir-prod --fix` and it showed that it was cleaning things with thousands
>>>>>> > of lines like

>>>>>> >> 2024-09-17T12:19:42.212+0000 7fea25b6f9c0 0 check_disk_state(): removing
>>>>>> >> manifest part from index:
>>>>>> >> mimir-prod:_multipart_tenant_prod/01J7Q778YXJXE23SRQZM9ZA4NH/chunks/000001.2~m6EI5fHFWxI-RmWB6TeFSupu7vVrCgh.2
>>>>>> >> 2024-09-17T12:19:42.212+0000 7fea25b6f9c0 0 check_disk_state(): removing
>>>>>> >> manifest part from index:
>>>>>> >> mimir-prod:_multipart_tenant_prod/01J7Q778YXJXE23SRQZM9ZA4NH/chunks/000001.2~m6EI5fHFWxI-RmWB6TeFSupu7vVrCgh.3
>>>>>> >> 2024-09-17T12:19:42.212+0000 7fea25b6f9c0 0 check_disk_state(): removing
>>>>>> >> manifest part from index:
>>>>>> >> mimir-prod:_multipart_tenant_prod/01J7Q778YXJXE23SRQZM9ZA4NH/chunks/000001.2~m6EI5fHFWxI-RmWB6TeFSupu7vVrCgh.4
>>>>>> >> 2024-09-17T12:19:42.212+0000 7fea25b6f9c0 0 check_disk_state(): removing
>>>>>> >> manifest part from index:
>>>>>> >> mimir-prod:_multipart_tenant_prod/01J7Q778YXJXE23SRQZM9ZA4NH/chunks/000001.2~m6EI5fHFWxI-RmWB6TeFSupu7vVrCgh.5
>>>>>> >> 2024-09-17T12:19:42.213+0000 7fea25b6f9c0 0 check_disk_state(): removing
>>>>>> >> manifest part from index:
>>>>>> >> mimir-prod:_multipart_tenant_prod/01J7Q778YXJXE23SRQZM9ZA4NH/chunks/000001.2~m6EI5fHFWxI-RmWB6TeFSupu7vVrCgh.6

>>>>>> > but the end result shows nothing has changed.

>>>>>> >> "check_result": {
>>>>>> >> "existing_header": {
>>>>>> >> "usage": {
>>>>>> >> "rgw.main": {
>>>>>> >> "size": 4281119287051,
>>>>>> >> "size_actual": 4281159110656,
>>>>>> >> "size_utilized": 4281119287051,
>>>>>> >> "size_kb": 4180780554,
>>>>>> >> "size_kb_actual": 4180819444,
>>>>>> >> "size_kb_utilized": 4180780554,
>>>>>> >> "num_objects": 36429
>>>>>> >> },
>>>>>> >> "rgw.multimeta": {
>>>>>> >> "size": 0,
>>>>>> >> "size_actual": 0,
>>>>>> >> "size_utilized": 66636,
>>>>>> >> "size_kb": 0,
>>>>>> >> "size_kb_actual": 0,
>>>>>> >> "size_kb_utilized": 66,
>>>>>> >> "num_objects": 2468
>>>>>> >> }
>>>>>> >> }
>>>>>> >> },
>>>>>> >> "calculated_header": {
>>>>>> >> "usage": {
>>>>>> >> "rgw.main": {
>>>>>> >> "size": 4281119287051,
>>>>>> >> "size_actual": 4281159110656,
>>>>>> >> "size_utilized": 4281119287051,
>>>>>> >> "size_kb": 4180780554,
>>>>>> >> "size_kb_actual": 4180819444,
>>>>>> >> "size_kb_utilized": 4180780554,
>>>>>> >> "num_objects": 36429
>>>>>> >> },
>>>>>> >> "rgw.multimeta": {
>>>>>> >> "size": 0,
>>>>>> >> "size_actual": 0,
>>>>>> >> "size_utilized": 66636,
>>>>>> >> "size_kb": 0,
>>>>>> >> "size_kb_actual": 0,
>>>>>> >> "size_kb_utilized": 66,
>>>>>> >> "num_objects": 2468
>>>>>> >> }
>>>>>> >> }
>>>>>> >> }
>>>>>> >> }

>>>>>> > Does this command do anything? Is it the wrong command for this issue? How
>>>>>> > does one go about fixing buckets in this state?

>>>>>> > Thanks!

>>>>>> > Reid
>>>>>> > _______________________________________________
>>>>>> > ceph-users mailing list -- [ mailto:ceph-users@xxxxxxx | ceph-users@xxxxxxx ]
>>>>>>> To unsubscribe send an email to [ mailto:ceph-users-leave@xxxxxxx |
>>>>>> > ceph-users-leave@xxxxxxx ]
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx