Re: Fixing NTFS index in snapshot for new and existing clones

Jason Dillaman <jdillama@xxxxxxxxxx> · Fri, 5 Aug 2016 09:00:43 -0400

If you had corruption in your backing RBD parent image snapshot, the
clones may or may not be affected depending on whether or not a CoW
was performed within the clone over the corrupted section (while it
was corrupted). Therefore, the safest course of action would be to
check each guest VM to ensure that they aren't affected. If you know
which backing parent objects were corrupted, you could map that to an
image extent and then use "rbd diff" against each clone to see if they
have any data registered within those regions.  If they don't, the
images would be fine; but if they do, you won't know if it was CoWed
before, during, or after the corruption episode, so you would need to
check those images.

On Thu, Aug 4, 2016 at 8:07 PM, John Holder <jholder@xxxxxxxxxxxxxxx> wrote:
> Hello!
>
> I would like some guidance about how to proceed with a problem inside of a
> snap which is used to clone images. My sincere apologies if what I am asking
> isn't possible.
>
> I have snapshot which is used to create clones for guest virtual machines.
> It is a raw object with an NTFS OS contained within it.
>
> My understanding is that when you clone the snap, all children become bound
> to the parent snap via layering.
>
> We had a system problem in which I was able to recover almost fully. I could
> go into details, but I figure if I do, the advice will be to upgrade past
> dumpling (I can see you shaking your head :D). It is in very short term plan
> to upgrade. I just want to be sure my cluster is totally as clean as I can
> make it before I do it.
>
> Recently, new clones and old clones started having a problem with the drive
> inside of windows. It seems to be a NTFS Index issue, which I can fix. (i've
> exported, verified the fix)
>
> So I only have 4 pretty simple questions:
>
> 1) Would it be right to assume that if I fix the snapshot NTFS problem, that
> would 'cascade' to all cloned VMs? If not, I'm assuming I have to repair all
> clones individually (which I can script).
> 2) Am I off-base if I think the problem is in the snapshot? Could it be in
> the source image all along?
> 3) If there is no relationship with this snap or master image, then am I
> correct to assume that this is an individual problem on each of these
> guests? Or is there a source I should look at?
> 4) Would upgrading to at least firefly resolve this issue?
>
> I've run many checks on the cluster and the data seems fully accessible and
> correct. No inconsistent pages, everything exports, snaps, can be moved. I
> also have the gdb debugger attached to watch for things which may arise in
> this version of ceph. I'll be upgrading once I find the answer to this.
>
> I have also attempted to ensure the parent/child relationship is intact at
> HEAD by rolling back to the snap as mentioned on this mailing list in
> January.
>
> Many thanks for your time!
>
> --
> John Holder
> Trapp Technology
> Developer, Linux, & Mail Operations
> Complacency kills innovation, but ambition kills complacency.
> Office: 602-443-9145 x2017
> On Call Cell: 480-548-3902
> Skype: z_jholder
> Alt-Email: jholder@xxxxxxxxxxxxx
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com