If you had corruption in your backing RBD parent image snapshot, the clones may or may not be affected depending on whether or not a CoW was performed within the clone over the corrupted section (while it was corrupted). Therefore, the safest course of action would be to check each guest VM to ensure that they aren't affected. If you know which backing parent objects were corrupted, you could map that to an image extent and then use "rbd diff" against each clone to see if they have any data registered within those regions. If they don't, the images would be fine; but if they do, you won't know if it was CoWed before, during, or after the corruption episode, so you would need to check those images. On Thu, Aug 4, 2016 at 8:07 PM, John Holder <jholder@xxxxxxxxxxxxxxx> wrote: > Hello! > > I would like some guidance about how to proceed with a problem inside of a > snap which is used to clone images. My sincere apologies if what I am asking > isn't possible. > > I have snapshot which is used to create clones for guest virtual machines. > It is a raw object with an NTFS OS contained within it. > > My understanding is that when you clone the snap, all children become bound > to the parent snap via layering. > > We had a system problem in which I was able to recover almost fully. I could > go into details, but I figure if I do, the advice will be to upgrade past > dumpling (I can see you shaking your head :D). It is in very short term plan > to upgrade. I just want to be sure my cluster is totally as clean as I can > make it before I do it. > > Recently, new clones and old clones started having a problem with the drive > inside of windows. It seems to be a NTFS Index issue, which I can fix. (i've > exported, verified the fix) > > So I only have 4 pretty simple questions: > > 1) Would it be right to assume that if I fix the snapshot NTFS problem, that > would 'cascade' to all cloned VMs? If not, I'm assuming I have to repair all > clones individually (which I can script). > 2) Am I off-base if I think the problem is in the snapshot? Could it be in > the source image all along? > 3) If there is no relationship with this snap or master image, then am I > correct to assume that this is an individual problem on each of these > guests? Or is there a source I should look at? > 4) Would upgrading to at least firefly resolve this issue? > > I've run many checks on the cluster and the data seems fully accessible and > correct. No inconsistent pages, everything exports, snaps, can be moved. I > also have the gdb debugger attached to watch for things which may arise in > this version of ceph. I'll be upgrading once I find the answer to this. > > I have also attempted to ensure the parent/child relationship is intact at > HEAD by rolling back to the snap as mentioned on this mailing list in > January. > > Many thanks for your time! > > -- > John Holder > Trapp Technology > Developer, Linux, & Mail Operations > Complacency kills innovation, but ambition kills complacency. > Office: 602-443-9145 x2017 > On Call Cell: 480-548-3902 > Skype: z_jholder > Alt-Email: jholder@xxxxxxxxxxxxx > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com