Re: rgw - unable to remove some orphans

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,
we’ve got the same issue with our cluster Ceph (release Pacific) and we saw this issue for the first time when we start to use it as offload storage for Veeam Backup. In fact Veeam, at the end of the offload job, when it try to delete the oldest files, gave us the “unknown error” which is related with the impossible delete of multiple object. At the very beginning we supposed to be an s3 api implementation bug with the multiple delete request, but digging into the radosgw-admin commands we found the orphan list and we saw that we had a lot (I mean hundreds of thousands) of orphans files. Our cluster is about 2.7TB raw capacity but are 50% full of orphan files.

Is there a way to delete them in a safe way? Or is it possible to change the garbage collector configuration to avoid this issue with the orphan files?

Thank you all, I was pretty scared that the issue was related with my fault during the cluster setup 😊

Fabio



From: Andrei Mikhailovsky <andrei@xxxxxxxxxx>
Date: Tuesday, 3 January 2023 at 16:35
To: EDH <mriosfer@xxxxxxxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxx>
Subject:  Re: rgw - unable to remove some orphans
Manuel,

Wow, I am pretty surprised to hear that the ceph developers hasn't addressed this issue already. It looks like it is a big issue, which is costing a lot of money to keep this orphan data unresolved.

Could someone from the developers comment on the issue and let us know if there is a workaround?

Cheers

Andrei

----- Original Message -----
> From: "EDH" <mriosfer@xxxxxxxxxxxxxxxx>
> To: "Andrei Mikhailovsky" <andrei@xxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxx>
> Sent: Tuesday, 3 January, 2023 13:36:19
> Subject: RE: rgw - unable to remove some orphans

> Object index database get corrupted and no ones can fix. We wipped a 500TB
> cluster years ago and move out ceph due this orphans bugs.
> After move all our data we saw in disk more than 100TB data unable to be deleted
> by ceph, also know as orphans... no sense.
>
> We expended thousand hours with this bug, the best solution replicate valid data
> to a new ceph cluster.
>
> Some providers solve this with x4 replica  but no money sense.
>
> Regards,
> Manuel
>
> CONFIDENTIALITY NOTICE:
> This e-mail message and all attachments transmitted with it may contain legally
> privileged, proprietary and/or confidential information intended solely for the
> use of the addressee. If you are not the intended recipient, you are hereby
> notified that any review, dissemination, distribution, duplication or other use
> of this message and/or its attachments is strictly prohibited. If you are not
> the intended recipient, please contact the sender by reply e-mail and destroy
> all copies of the original message and its attachments. Thank you.
> No imprimas si no es necesario. Protejamos el Medio Ambiente.
>
>
> -----Original Message-----
> From: Andrei Mikhailovsky <andrei@xxxxxxxxxx>
> Sent: martes, 3 de enero de 2023 13:46
> To: ceph-users <ceph-users@xxxxxxx>
> Subject:  rgw - unable to remove some orphans
>
> Happy New Year everyone!
>
> I have a bit of an issue with removing some of the orphan objects that were
> generated with the rgw-orphan-list tool. Over the years rgw generated over 14
> million orphans with an overall waste of over 100TB in size, considering the
> overall data stored in rgw was well under 10TB at max. Anyways, I have managed
> to remove around 12m objects over the holiday season, but there are just over
> 2m orphans which were not removed. Here is an example of one of the objects
> taken from the orphans list file:
>
> $ rados -p .rgw.buckets rm 'default.775634629.1__multipart_SQL
> Backups/ALL-POND-LIVE_backup_2021_05_26_204508_8473183.d20210526-u200953.bak.s26895803904.zip.0e6LO9b4w9H3HepY-3IW_JSOaysLdFs.1_92'
>
> error removing .rgw.buckets>default.775634629.1__shadow_SQL
> Backups/ALL-POND-LIVE_backup_2021_05_26_204508_8473183.d20210526-u200953.bak.s26895803904.zip.0e6LO9b4w9H3HepY-3IW_JSOaysLdFs.1_92:
> (2) No such file or directory
>
> Checking the presence of the object with the rados tool shows that the object is
> there.
>
> $ cat orphan-list-20230103105849.out |grep -a JSOaysLdFs |grep -a 92
> default.775634629.1__shadow_SQL
> Backups/ALL-POND-LIVE_backup_2021_05_26_204508_8473183.d20210526-u200953.bak.s26895803904.zip.0e6LO9b4w9H3HepY-3IW_JSOaysLdFs.1_92
>
> $ cat rados-20230103105849.intermediate |grep -a JSOaysLdFs |grep -a 92
> default.775634629.1__shadow_SQL
> Backups/ALL-POND-LIVE_backup_2021_05_26_204508_8473183.d20210526-u200953.bak.s26895803904.zip.0e6LO9b4w9H3HepY-3IW_JSOaysLdFs.1_92
>
>
> Why can't I remove it? I have around 2m objects which can't be removed. What can
> I do to remove them?
>
> Thanks
>
> Andrei
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux