Hi Felix, Response inline. ----- Le 5 Fév 25, à 8:40, Felix Stolte <f.stolte@xxxxxxxxxxxxx> a écrit : > Hi Frédéric , >> Can you try the below command? >> $ rados -p mailbox listsnaps rbd_data.26f7c5d05af621.0000000000002adf >> rbd_data.26f7c5d05af621.0000000000002adf: >> cloneidsnapssizeoverlap >> 334194304[] >> head-4194304 <---- Do you see this line? > $ rados -p mailbox listsnaps rbd_data.26f7c5d05af621.0000000000002adf > rbd_data.26f7c5d05af621.0000000000002adf: > cloneidsnapssizeoverlap > 2753427523,275344194304[0~1904640] > 27553275534194304[0~1904640] > 2763627599,276364194304[368640~3825664] > 27673276734194304[1126400~3067904] > 27710277104194304[1646592~2547712] > 27721277214194304[1634304~2560000] > 27732277324194304[1695744~2498560] > 27743277434194304[1695744~2498560] > 27780277804194304[] > And > $ rados -p cephfs_data_ec listsnaps 100198218f2.0000008a > 100198218f2.0000008a: > cloneidsnapssizeoverlap > 476247054194304[] >> If you don't see the 'head' line, then you're probably facing the orphan clones >> AKA leaked snapshots bug described here [1], that was fixed by [2]. > Looks like we are affected by [ https://tracker.ceph.com/issues/64646 | > https://tracker.ceph.com/issues/64646 ] on both pools. We are currently on > 17.2.7, while the fix is in 17.2.8. Yep. >> To get rid of these orphan clones, you need run to run the below command on the >> pool that we requeue orphan objects for being snap trimmed. See [3] for >> details. >> $ ceph osd pool force-remove-snap cephfs_data_ec >> Note that : >> 1. there's a --dry-run option you can use. >> 2. This command should only be run ONCE. Running it twice or more is useless and >> can lead to OSDs crashing (restarting fine, but still... crashing at the same >> time). > This sounds a little bit scary to be honest. Do you mean osds can crash once and > restart fine or do they crash, restart, crash, … ? We've seen OSDs crashing and restarting just fine (no crash loop or whatsoever) when running the 'force-remove-snap' command on test pool ** that still had snapshots **. Corner case... So I would recommend removing all snapshots on these 2 pools (if still any) before you run the 'force-remove-snap' command. No worries, you should be fine. When running the 'force-remove-snap' command, you should see all PGs trimming snaps and the number of CLONES (on rados df output) going back to 0. Frédéric. >> Let us know how it goes. >> Cheers, >> Frédéric. >> [1] https://tracker.ceph.com/issues/64646 >> [2] https://github.com/ceph/ceph/pull/55841 >> [3] https://github.com/ceph/ceph/pull/53545 >> ----- Le 31 Jan 25, à 9:54, Frédéric Nass frederic.nass@xxxxxxxxxxxxxxxx a écrit >> : >>> Hi Felix, >>> This is weird. The most likely explanation is that this resulted from a bug that >>> has certainly been fixed. >>> What you try could as a starter is a ceph-bluestore-tool fsck [1] over the >>> primary OSD of rbd_data.26f7c5d05af621.0000000000002adf to see if it lists any >>> inconsistencies: >>> $ cephadm shell --name osd.$i --fsid $(ceph fsid) ceph-bluestore-tool fsck >>> --path /var/lib/ceph/osd/ceph-$i/ --deep yes >>> If it doesn't then I suppose you'll have to get rid of these objects you can't >>> 'rados rm' with the help of ceph-objectstore-tool <remove> [2]. >>> Regards, >>> Frédéric. >>> [1] https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/ >>> [2] >>> https://docs.ceph.com/en/latest/man/8/ceph-objectstore-tool/#removing-an-object >>> ----- Le 31 Jan 25, à 8:29, Felix Stolte <f.stolte@xxxxxxxxxxxxx> a écrit : >>>> Hi Frederic, >>>> thanks for you suggestions. I took a look at all the objects and discovered the >>>> following: >>>> 1. rbd_id.<image_name>, rbd_header.<image_id> and rbd_object_map.<image_id> >>>> exist only for the 3 images listed by ‚rbd ls‘ (i created a test image >>>> yesterday). >>>> 2. There are about 30 different image_ids with rbd_data.<image_id> Objects which >>>> do not have an id, header oder object_map object >>>> After that i tried to get the stats of one of the orphaned objects: >>>> rados -p mailbox stat rbd_data.26f7c5d05af621.0000000000002adf >>>> error stat-ing mailbox/rbd_data.26f7c5d05af621.0000000000002adf: (2) No such >>>> file or directory >>>> I double checked, that object name is the one listed by 'rados ls‘. What makes >>>> it worse is that i neither can stat, get or rm the objects while they still are >>>> counted for disk usage. We will remove the whole pool for sure, but i really >>>> like to get to the cause of this to prevent it from happening again. >>>>> Am 30.01.2025 um 10:54 schrieb Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>: >>>>> Hi Felix, >>>>> Every rbd_data object belongs to an image that should have : >>>>> - an rbd_id.<image_name> object containing the image id (that you can get with >>>>> rbd info) >>>>> - an rbd_header object with omap atrributes you can list with listomapvals >>>>> To identify the image name these rbd_data object belong(ed) to you could list >>>>> all rbd_id objects in that pool and, for each one of them, print the image id >>>>> and the image name with the below command: >>>>> $ for rbd_id in $(rados -p $poolname ls | grep rbd_id) ; do echo "$(echo $rbd_id >>>>> | cut -d '.' -f2) : $(rados -p $poolname get $rbd_id - | strings)" ; done >>>>> image2 : 2a733b30debc84 >>>>> image1 : 28d4fc1dddd922 >>>>> It might take some time but you'd get a clearer view of what these rbd_data >>>>> objects refer(ed) to. Also, if you can decode the timestamps in 'rados -p rbd >>>>> listomapvals rbd_header.<id> | strings' output, you could know when each image >>>>> was created and accessed for the last time. >>>>> Hope that helps. >>>>> Regards, >>>>> Frédéric. >>>>> PS: If you're moving away from iSCSI and only have 2 remaining images in this >>>>> pool, you may also wait until these images are no longer in use and then detach >>>>> them and remove the whole pool. >>>>> ----- Le 30 Jan 25, à 9:09, Felix Stolte <f.stolte@xxxxxxxxxxxxx> a écrit : >>>>>> Hi Frederic, >>>>>> there is no namespace. The pool in question has the application rbd, but is not >>>>>> the default pool named ‚rbd' >>>>>>> Am 29.01.2025 um 11:24 schrieb Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>: >>>>>>> Hi Felix, >>>>>>> Any RADOS namespaces in that pool? You can check using either: >>>>>>> rbd namespace ls rbd >>>>>>> or >>>>>>> rados stat -p rbd rbd_namespace && rados -p rbd listomapvals rbd_namespace >>>>>>> The rbd_data objects might be linked to namespaced images that can only be >>>>>>> listed using the command: rbd ls --namespace <namespace> >>>>>>> I suggest checking this because the 'rbd' pool has historically been Ceph's >>>>>>> default RBD pool, long before iSCSI began using it (in its hardcoded >>>>>>> implementation). >>>>>>> Might be worth checking this before taking any actions. >>>>>>> Regards, >>>>>>> Frédéric. >>>>>>> ----- Le 29 Jan 25, à 8:53, Felix Stolte f.stolte@xxxxxxxxxxxxx a écrit : >>>>>>>> Hi Alexander, >>>>>>>> trash is empty and rbd ls only lists two images with the prefix >>>>>>>> rbd_data.1af561611d24cf and rbd_data.ed93e6548ca56b >>>>>>>> rados ls gives: >>>>>>>> rbd_data.d1b81247165450.00000000000055d2 >>>>>>>> rbd_data.32de606b8b4567.0000000000012f2f >>>>>>>> rbd_data.ed93e6548ca56b.00000000000eef03 >>>>>>>> rbd_data.26f7c5d05af621.0000000000002adf >>>>>>>> …. >>>>>>>> Am 28.01.2025 um 22:46 schrieb Alexander Patrakov <patrakov@xxxxxxxxx>: >>>>>>>> Hi Felix, >>>>>>>> A dumb answer first: if you know the image names, have you tried "rbd >>>>>>>> rm $pool/$imagename"? Or, is there any reason like concerns about >>>>>>>> iSCSI control data integrity that prevents you from trying that? >>>>>>>> Also, have you checked the rbd trash? >>>>>>>> On Tue, Jan 28, 2025 at 5:43 PM Stolte, Felix <f.stolte@xxxxxxxxxxxxx> wrote: >>>>>>>> Hi guys, >>>>>>>> we have a rbd pool we used for images exported via ceph-iscsi on a 17.2.7 >>>>>>>> cluster. The pool uses 10 times the diskspace i would suppose it should and >>>>>>>> after investigating we noticed a lot of rbd_data Objects which images are no >>>>>>>> longer present. I assume that the original images were deleted using the gwcli >>>>>>>> but not all Objects have been removed properly. >>>>>>>> What would be the best/most secure way to get rid of these orphaned objects and >>>>>>>> reclaim the diskspace? >>>>>>>> Best regards >>>>>>>> Felix >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> Forschungszentrum Juelich GmbH >>>>>>>> 52425 Juelich >>>>>>>> Sitz der Gesellschaft: Juelich >>>>>>>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>>>>>>> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller >>>>>>>> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), >>>>>>>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>>>> -- >>>>>>>> Alexander Patrakov >>>>>>>> mit freundlichem Gruß >>>>>>>> Felix Stolte >>>>>>>> IT-Services >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> Forschungszentrum Juelich GmbH >>>>>>>> 52425 Juelich >>>>>>>> Sitz der Gesellschaft: Juelich >>>>>>>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>>>>>>> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller >>>>>>>> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), >>>>>>>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> --------------------------------------------------------------------------------------------- >>>>>>>> _______________________________________________ >>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx >>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>>> mit freundlichem Gruß >>>>>> Felix Stolte >>>>>> IT-Services >>>>>> --------------------------------------------------------------------------------------------- >>>>>> --------------------------------------------------------------------------------------------- >>>>>> Forschungszentrum Juelich GmbH >>>>>> 52425 Juelich >>>>>> Sitz der Gesellschaft: Juelich >>>>>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>>>>> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller >>>>>> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), >>>>>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens >>>>>> --------------------------------------------------------------------------------------------- >>>>>> --------------------------------------------------------------------------------------------- >>>> mit freundlichem Gruß >>>> Felix Stolte >>>> IT-Services >>>> --------------------------------------------------------------------------------------------- >>>> --------------------------------------------------------------------------------------------- >>>> Forschungszentrum Juelich GmbH >>>> 52425 Juelich >>>> Sitz der Gesellschaft: Juelich >>>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >>>> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller >>>> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), >>>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens >>>> --------------------------------------------------------------------------------------------- >>>> --------------------------------------------------------------------------------------------- >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > mit freundlichem Gruß > Felix Stolte > IT-Services > --------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------- > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Stefan Müller > Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens > --------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------- _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx