Re: Recovering from Arb/Quorum Write Locks

wk <wkmail@xxxxxxxxx> · Sun, 28 May 2017 22:15:12 -0700

On 5/28/2017 9:24 PM, Ravishankar N wrote:
Just to elaborate further, if all nodes were up to begin with and 
there were zero self-heals pending, and you only brought down only 
gluster2, writes must still be allowed. I guess in your case, there 
must be some pending heals from gluster2 to gluster1 before you 
brought gluster2 down due to a network disconnect from the fuse mount 
to gluster1.

OK, I was aggressively writing within and to those VMs all at the same 
time pulling cables (power and network). My initial observation was that 
the shards healed quickly, but perhaps that I may have gotten too 
aggressive didn't wait long enough between tests for the healing to 
kick-in and/or finish.

I will retest and pay attention to outstanding heals, both prior and 
during the tests.

I suppose I could fiddle with the quorum settings as above, but I'd 
like to be able to PAUSE/FLUSH/FSYNC the Volume before taking down 
Gluster2, then unpause and let the volume continue with Gluster1 and 
the ARB providing some sort of protection and to help when Gluster2 
is returned to the cluster.

I think you should try to find if there were self-heals pending to 
gluster1 before you brought gluster2 down or the VMs should not have 
paused.

yes, I'll start look at heals PRIOR to yanking cables.

OK, can I assume SOME pause is expected when Gluster first sees gluster2 
go down which would unpause after a timeout period. I have seen that 
behaviour as well.

-bill

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users