2024年5月2日(木) 7:42 Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx>: > > Hi Maged, > > 2024年5月2日(木) 5:34 Maged Mokhtar <mmokhtar@xxxxxxxxxxx>: >> >> >> On 01/05/2024 16:12, Satoru Takeuchi wrote: >> > I confirmed that incomplete data is left on `rbd import-diff` failure. >> > I guess that this data is the part of snapshot. Could someone answer >> > me the following questions? >> > >> > Q1. Is it safe to use the RBD image (e.g. client I/O and snapshot >> > management) even though incomplete data exists? >> > Q2. Is there any way to clean up the incomplete data? >> > >> > I read the following document and understand that this problem will be >> > resolved after running `rbd import-diff` again. >> > >> > https://ceph.io/en/news/blog/2013/incremental-snapshots-with-rbd/ >> >> Since overwriting the same data is idempotent, it’s safe to have an import-diff interrupted in the middle. >> > However, it's difficult if I can't access the exported backup data >> > anymore. For instance, I'm afraid of the following scenario. >> > >> > 1. Send the backup data from one DC (DC0) to another DC (DC1) periodically. >> > 2. The backup data is created in DC0 and is sent directly to DC1 >> > without persist backup data as a file. >> > 3. Major power outage happens in DC0 and it's impossible to >> > re-generate the backup data for a long time. >> > >> > I simulated this problem as follows: >> > >> > 1. Create an RBD image. >> > 2. Write some data to this image. >> > 3. Create a snapshot S0. >> > 4. Write another data to this image. >> > 5. Create a snapshot S1. >> > 6. Create a backup data consists of the difference between S0 and S1 >> > by running rbd export-diff. >> > 7. Delete the last byte of the backup data, which is 'e' and means the >> > end of the backup data, to inject import-diff failure. >> > 8. Delete S1. >> > 9. Run rbd import-diff to apply the broken backup data created in the step 7. >> > >> > Then step9 failed and S1 was not created. However, the number of RADOS >> > objects and the storage usage has increased. >> > >> > before >> > ``` >> > $ rados -p replicapool df >> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY >> > UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER >> > COMPR >> > replicapool 11 MiB 24 9 24 0 >> > 0 0 3609 53 MiB 279 41 MiB 0 B 0 B >> > >> > total_objects 24 >> > total_used 39 MiB >> > total_avail 32 GiB >> > total_space 32 GiB >> > ``` >> > >> > after: >> > ``` >> > $ rados -p replicapool df >> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY >> > UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER >> > COMPR >> > replicapool 12 MiB 25 9 25 0 >> > 0 0 3531 53 MiB 278 41 MiB 0 B 0 B >> > >> > total_objects 25 >> > total_used 40 MiB >> > total_avail 32 GiB >> > total_space 32 GiB >> > ``` >> > >> > The incomplete data seem to increase if rbd import-diff fails again >> > and again. The following output was get after the above-mentioned >> > step9 100 times. >> > >> > ``` >> > $ rados -p replicapool df >> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY >> > UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER >> > COMPR >> > replicapool 12 MiB 25 9 25 0 >> > 0 0 7925 104 MiB 1308 164 MiB 0 B 0 >> > B >> > >> > total_objects 25 >> > total_used 58 MiB >> > total_avail 32 GiB >> > total_space 32 GiB >> > ``` >> > >> > Thanks, >> > Satoru >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> the image is not in a consistent state so should not be used as is. if >> you no longer have access to the source image or its exported data, you >> should be able to use the rbd snap rollback command to rollback the >> destination image to its last known good snapshot, the destination >> snapshots get created from the import-diff command with names matching >> source snapshots. > > > Thank you for the reply. I succeed to rollback the rbd image to S0 and `total_objects` got back to the previous value (24). > > On the other hand, `total_used` didn't become the original value. Repeating the following steps resulted in the continuous growth of `total_used`. > > 1. Import the broken diff (it fails). > 2. Rollback to S0. > > I guess it's a resource leak. > > Could you tell me whether I can clean up these remaining garbage data? I verified the behavior of rollback after rbd import failure. Then garbage data seems to disappear. I opened a new issue to know whether garbage data disappears in all cases. https://tracker.ceph.com/issues/65873 Thanks again Maged to answer my question. Best, Satoru _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx