Hi, I created an erasure coded pool and thereafter created RBD images by specifying the ‘--data-pool’ parameter. I subsequently created locked snapshots and cloned them for systems I was setting up. After finishing I realised that I hadn’t
specified the ‘—data-pool’ parameter when creating the clones, damn! Any changes on the clones were being stored directly in the ‘rbd_ssd’ pool, instead of the erasure coded ‘ec_ssd’ pool… There were 4 systems with 3 disks each so I, for each cloned drive, renamed it, created a new one (using the ‘--data-pool’ switch this time) and then used some Perl that has been handy a whole bunch of times to only copy over 4MB chunks
when the MD5 hash didn’t match between the source and destination block devices. This way the source and destination images are 100% identical and any blocks that match the original parent are skipped. PS: It would be nice to retrieve the crc values for the object store blocks, as this would avoid reading the full images to calculate the MD5 sum per block… for ID in 211 212 213 214; do for f in 1 2 3; do rbd mv rbd_ssd/vm-$ID-disk-$f rbd_ssd/original-$ID-disk-$f; rbd clone rbd_ssd/base-210-disk-"$f"@__base__ rbd_ssd/vm-$ID-disk-"$f" --data-pool ec_ssd; done done rbd resize rbd_ssd/vm-213-disk-3 --size 50G; rbd resize rbd_ssd/vm-214-disk-3 --size 1T; for ID in 211 212 213 214; do for f in 1 2 3; do export dev1=`rbd map rbd_ssd/original-$ID-disk-$f --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring`; export dev2=`rbd map rbd_ssd/vm-$ID-disk-$f --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring`; perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 | perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_); read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 | perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) { seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print $buf}' 1<> $dev2; rbd unmap $dev1; rbd unmap $dev2; done done # Compare amount of used space: for ID in 211 212 213 172; do for f in 1 2 3; do echo -e "\nNAME PROVISIONED USED"; rbd du rbd_ssd/original-$ID-disk-"$f" 2> /dev/null | grep -P "^\S+disk-$f\s" | while read n a u; do printf "%-22s %9s %9s\n" $n $a $u; done; rbd du rbd_ssd/vm-$ID-disk-"$f" 2> /dev/null | grep -P "^\S+disk-$f\s" | while read n a u; do printf "%-22s %9s %9s\n" $n $a $u; done; done done Sample output: NAME PROVISIONED USED original-211-disk-1 4400M 28672k vm-211-disk-1 4400M 28672k NAME PROVISIONED USED original-211-disk-2 30720M 6312M vm-211-disk-2 30720M 6300M NAME PROVISIONED USED original-211-disk-3 20480M 2092M vm-211-disk-3 20480M 2088M vm-211-disk-3 uses 4MB less data than original-211-disk-3 but validating the content of the images confirms that they are identical: ID=211; f=3; export dev1=`rbd map rbd_ssd/original-$ID-disk-$f --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring`; export dev2=`rbd map rbd_ssd/vm-$ID-disk-$f --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring`; dd if=$dev1 bs=128M 2> /dev/null | sha1sum; dd if=$dev2 bs=128M 2> /dev/null | sha1sum; rbd unmap $dev1; rbd unmap $dev2; Output: 979ab34ea645ef6f16c3dbb5d3a78152018ea8e7 - 979ab34ea645ef6f16c3dbb5d3a78152018ea8e7 - PS: qemu-img runs much faster than the perl nightmare above, as it knows which blocks contain data, BUT it copies data every time so using it with snapshot rotations results in each snapshot being the full source image data size. The perl
method results in reading overhead (Ceph does however feed it zeros for unallocated blocks, which aren’t actually read from anywhere) so it’s much slower than qemu-img but exclusively copies blocks which are different. The following may also be useful to others. It’s a relatively simple script to use the Perl method above to backup images from one pool to another. The script could easily be tweaked to use LVM snapshots as a destination and the method
is compatible with any block device. Notes: We have rbd_ssd/base-210-disk-X as a protected snapshot (clone parent) and then have 4 children where each VM has 3 disks. You would need to create the destination images and ensure that their size matches the source images as a prerequisite.
The following script rotates 3 snapshots each time it runs and additionally creates a snapshot of the source images (not the static clone parent) before comparing block devices: #!/bin/sh src=''; dst='rbd_hdd'; rbdsnap () { [ "x" = "$1"x ] && return 1; [ `rbd snap ls $1 | grep -Pc "^\s+\d+\s+$2\s"` -gt 0 ] && return 0 || return 1; } # Backup 'template-debian-9.3' (clone parent) - Should never change so no need to maintain snapshots or run it on a continual basis: #for ID in 210; do # for f in 1 2 3; do # echo -en "\t\t : Copying "$src"/base-"$ID"-disk-"$f"@__base__ to "$dst"/vm-"$ID"-disk-"$f"_backup"; # qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp -S 4M rbd:"$src"/base-"$ID"-disk-"$f"@__base__ rbd:"$dst"/vm-"$ID"-disk-"$f"_backup; # done #done # Backup images (clone children): for ID in 211 212 213 214; do for f in 1 2 3; do rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap3 && rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap2 && rbd snap rm "$dst"/vm-"$ID"-disk-"$f"_backup@snap3; rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap3 || rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap2 && rbd snap rename "$dst"/vm-"$ID"-disk-"$f"_backup@snap2 "$dst"/vm-"$ID"-disk-"$f"_backup@snap3; rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap2 || rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap1 && rbd snap rename "$dst"/vm-"$ID"-disk-"$f"_backup@snap1 "$dst"/vm-"$ID"-disk-"$f"_backup@snap2; rbdsnap "$dst"/vm-"$ID"-disk-"$f"_backup snap1 || rbd snap create "$dst"/vm-"$ID"-disk-"$f"_backup@snap1; rbd snap create "$src"/vm-"$ID"-disk-"$f"@backupinprogress; done for f in 1 2 3; do echo -en "\t\t : Copying "$src"/vm-"$ID"-disk-"$f" to "$dst"/vm-"$ID"-disk-"$f"_backup"; #qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp -S 4M rbd:"$src"/vm-"$ID"-disk-"$f"@backupinprogress rbd:"$dst"/vm-"$ID"-disk-"$f"_backup; export dev1=`rbd map "$src"/vm-"$ID"-disk-"$f@backupinprogress" --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring`; export dev2=`rbd map "$dst"/vm-"$ID"-disk-"$f"_backup --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring`; perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 | perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_); read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 | perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) { seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print $buf}' 1<> $dev2; rbd unmap $dev1; rbd unmap $dev2; rbd snap rm "$src"/vm-"$ID"-disk-"$f"@backupinprogress; done done Commenting out everything from ‘export dev1’ to ‘rbd unmap $dev2’ and uncommenting out the qemu-img command yields the following: real 0m48.598s user 0m14.583s sys 0m10.986s [admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup NAME PROVISIONED USED vm-211-disk-3_backup@snap3 20480M 2764M vm-211-disk-3_backup@snap2 20480M 2764M vm-211-disk-3_backup@snap1 20480M 2764M vm-211-disk-3_backup 20480M 2764M <TOTAL> 20480M 11056M Repeating the copy using the Perl solution is much slower but as the VM is currently off nothing has changed and each snapshot consumes zero data: real 1m49.000s user 1m34.339s sys 0m17.847s [admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup warning: fast-diff map is not enabled for vm-211-disk-3_backup. operation may be slow. NAME PROVISIONED USED vm-211-disk-3_backup@snap3 20480M 2764M vm-211-disk-3_backup@snap2 20480M 0 vm-211-disk-3_backup@snap1 20480M 0 vm-211-disk-3_backup 20480M 0 <TOTAL> 20480M 2764M PS: Not if this that is a Ceph display bug, why would the snapshot base be reported as not consuming any data and the first snapshot (rotated to ‘snap3’) report all the usage? Purging all snapshots yields the following: [admin@kvm5a ~]# rbd du rbd_hdd/vm-211-disk-3_backup warning: fast-diff map is not enabled for vm-211-disk-3_backup. operation may be slow. NAME PROVISIONED USED vm-211-disk-3_backup 20480M 2764M Regards David Herselman |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com