Re: AFR Heal Bug

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Mon, 31 Dec 2007 02:10:51 +0530

On Dec 31, 2007 2:03 AM, Gareth Bult <gareth@xxxxxxxxxxxxx> wrote:
> Hi,
>
> Many thanks, a fix would be great .. :)
>
> I've been doing a little more testing and can confirm that AFR definitely does not honor "sparse" when healing.
>
> This is particularly noticeable when using XEN images.
>
> A typical XEN image might be 3G for example, with "du" reporting 600M used.
> After "healing" the image to another brick, it shows 3G size, and du shows 3G used.

Ah OK. As I said in the other thread I have not tested how afr
selfheal behaves with holes.
I need to investigate. Surely it will be fixed.
Please open a bug ID (I think you already have) and track it and get it fixed :)

>
> This makes a fair difference to my "images" volume (!)
>
> [in addition to the problems when applied to stripes!]

Yes, you had mentioned that all afrs heal instead of the ones which
got modified.

Thanks for reporting the issues.

-Krishna

>
> Regards,
> Gareth.
>
>
>
> ----- Original Message -----
> From: "Krishna Srinivas" <krishna@xxxxxxxxxxxxx>
> To: "Gareth Bult" <gareth@xxxxxxxxxxxxx>
> Cc: "gluster-devel" <gluster-devel@xxxxxxxxxx>
> Sent: Sunday, December 30, 2007 8:10:42 PM (GMT) Europe/London
> Subject: Re: AFR Heal Bug
>
> Hi Gareth,
>
> Yes this bug was introduced recently after we did changes to the way
> readdir() call worked in glusterfs, afr is calling readdir() only from the
> first child (which is blank in your case) fix will be on its way in a couple
> of days.
>
> Thanks
> Krishna
>
> On Dec 31, 2007 12:39 AM, Gareth Bult <gareth@xxxxxxxxxxxxx> wrote:
> > Ok, I'm going to call it a bug, tell me if I'm wrong .. :)
> >
> > (two servers, both define a "homes" volume)
> >
> > Client;
> >
> > volume nodea-homes
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host nodea
> > option remote-subvolume homes
> > end-volume
> >
> > volume nodeb-homes
> > type protocol/client
> > option transport-type tcp/client
> > option remote-host nodeb
> > option remote-subvolume homes
> > end-volume
> >
> > volume homes-afr
> > type cluster/afr
> > subvolumes nodea-homes nodeb-homes ### ISSUE IS HERE! ###
> > option scheduler rr
> > end-volume
> >
> > Assume system is completely up-to-date and working Ok.
> > Mount homes filesystem on "client".
> > Kill the "nodea" server.
> > System carries on, effectively using nodeb.
> >
> > Wipe nodea's physical volume.
> > Restart nodea server.
> >
> > All of a sudden, "client" see's an empty "homes" filesystem, although data is still in place on "B" and "A" is blank.
> > i.e. the client is seeing the blank "nodea" only (!)
> >
> > .. at this point you check nodeb to make sure your data really is there, then you can mop up the coffee you've just spat all over your screens ..
> >
> > If you crash nodeB instead, there appears to be no problem, and a self heal "find" will correct the blank volume.
> > Alternatively, if you reverse the subvolumes as listed above, you don't see the problem.
> >
> > The issue appears to be blanking the first subvolume.
> >
> > I'm thinking the order of the volumes should not be an issue, gluster should know one volume is empty / new and one contains real data and act accordingly, rather than relying on the order volumes are listed .. (???)
> >
> > I'm using fuse glfs7 and gluster 1.3.8 (tla).
> > _______________________________________________
> > Gluster-devel mailing list
> > Gluster-devel@xxxxxxxxxx
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>