Why does this setup not survive a node crash?

mohitanchlia at gmail.com (Mohit Anchlia) · Wed, 16 Mar 2011 11:26:03 -0700



Thanks! Also, how did you find that you need to disable
--disable-direct-io-mode?

On Wed, Mar 16, 2011 at 11:19 AM, Burnash, James <jburnash at knight.com> wrote:
> Yes - the "long" response time went to about 18 seconds from over a minute (at least).
>
> Whether or not this is generally a good idea is something I'll let the devs and the list respond to.
>
>
> James Burnash, Unix Engineering
>
> -----Original Message-----
> From: Mohit Anchlia [mailto:mohitanchlia at gmail.com]
> Sent: Wednesday, March 16, 2011 2:17 PM
> To: Burnash, James; gluster-users at gluster.org
> Subject: Re: Why does this setup not survive a node crash?
>
> Thanks for sharing! Did it make it any faster after changing
> network.ping-timeout to 10 secs?
>
> On Wed, Mar 16, 2011 at 11:07 AM, Burnash, James <jburnash at knight.com> wrote:
>> So - answering myself with the (apparent) solution. The configuration IS correct as shown - the problems were elsewhere.
>>
>> Primary cause for this seems to be performing the gluster native client mount on a virtual machine WITHOUT using the " -O --disable-direct-io-mode" parameter.
>>
>> So I was mounting like this:
>>
>> ? ? ? ?mount -t glusterfs jc1letgfs5:/test-pfs-ro1 /test-pfs2
>>
>> When I should have been doing this:
>>
>> ? ? ? ?mount -t glusterfs -O --disable-direct-io-mode jc1letgfs5:/test-pfs-ro1 /test-pfs2
>>
>> Secondly, I changed the volume parameter "network.ping-timeout" from its default of 43 to 10 seconds, in order to get faster recovery from a downed storage node:
>>
>> ? ? ? ?gluster volume set pfs-rw1 network.ping-timeout 10
>>
>> This configuration now survives the loss of either node of the two storage server mirrors. There is a noticeable delay before commands on the mount point complete the first time a command is issued after one of the nodes have gone done - but then they return at the same speed as when all nodes were present.
>>
>> Thanks especially to all who helped, and Anush who helped me troubleshoot it from a different angle.
>>
>> James Burnash, Unix Engineering
>>
>> -----Original Message-----
>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
>> Sent: Friday, March 11, 2011 11:31 AM
>> To: gluster-users at gluster.org
>> Subject: Re: Why does this setup not survive a node crash?
>>
>> Could anyone else please take a peek at this an sanity check my configuration. I'm quite frankly at a loss and tremendously under the gun ...
>>
>> Thanks in advance to any kind souls.
>>
>> James Burnash, Unix Engineering
>>
>> -----Original Message-----
>> From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Burnash, James
>> Sent: Thursday, March 10, 2011 3:55 PM
>> To: gluster-users at gluster.org
>> Subject: Why does this setup not survive a node crash?
>>
>> Perhaps someone will see immediately, given the data below, why this configuration will not survive a crash of one node - it appears that any node crashed out of this set will cause gluster native clients to hang until the node comes back.
>>
>> Given (2) initial storage servers (CentOS 5.5, Gluster 3.1.1):
>>
>> Starting out by creating a Replicated-Distributed pair with this command:
>> gluster volume create test-pfs-ro1 replica 2 jc1letgfs5:/export/read-only/g01 jc1letgfs6:/export/read-only/g01 jc1letgfs5:/export/read-only/g02 jc1letgfs6:/export/read-only/g02
>>
>> Which ran fine (thought I did not attempt to crash 1 of the pair)
>>
>> And then adding (2) more servers, identically configured, with this command:
>> gluster volume add-brick test-pfs-ro1 jc1letgfs7:/export/read-only/g01 jc1letgfs8:/export/read-only/g01 jc1letgfs7:/export/read-only/g02 jc1letgfs8:/export/read-only/g02
>> Add Brick successful
>>
>> root at jc1letgfs5:~# gluster volume info
>>
>> Volume Name: test-pfs-ro1
>> Type: Distributed-Replicate
>> Status: Started
>> Number of Bricks: 4 x 2 = 8
>> Transport-type: tcp
>> Bricks:
>> Brick1: jc1letgfs5:/export/read-only/g01
>> Brick2: jc1letgfs6:/export/read-only/g01
>> Brick3: jc1letgfs5:/export/read-only/g02
>> Brick4: jc1letgfs6:/export/read-only/g02
>> Brick5: jc1letgfs7:/export/read-only/g01
>> Brick6: jc1letgfs8:/export/read-only/g01
>> Brick7: jc1letgfs7:/export/read-only/g02
>> Brick8: jc1letgfs8:/export/read-only/g02
>>
>> And this volfile info out of the log file /var/log/glusterfs/etc-glusterd-mount-test-pfs-ro1.log:
>>
>> [2011-03-10 14:38:26.310807] W [dict.c:1204:data_to_str] dict: @data=(nil)
>> Given volfile:
>> +------------------------------------------------------------------------------+
>> ?1: volume test-pfs-ro1-client-0
>> ?2: ? ? type protocol/client
>> ?3: ? ? option remote-host jc1letgfs5
>> ?4: ? ? option remote-subvolume /export/read-only/g01
>> ?5: ? ? option transport-type tcp
>> ?6: end-volume
>> ?7:
>> ?8: volume test-pfs-ro1-client-1
>> ?9: ? ? type protocol/client
>> ?10: ? ? option remote-host jc1letgfs6
>> ?11: ? ? option remote-subvolume /export/read-only/g01
>> ?12: ? ? option transport-type tcp
>> ?13: end-volume
>> ?14:
>> ?15: volume test-pfs-ro1-client-2
>> ?16: ? ? type protocol/client
>> ?17: ? ? option remote-host jc1letgfs5
>> ?18: ? ? option remote-subvolume /export/read-only/g02
>> ?19: ? ? option transport-type tcp
>> ?20: end-volume
>> ?21:
>> ?22: volume test-pfs-ro1-client-3
>> ?23: ? ? type protocol/client
>> ?24: ? ? option remote-host jc1letgfs6
>> ?25: ? ? option remote-subvolume /export/read-only/g02
>> ?26: ? ? option transport-type tcp
>> ?27: end-volume
>> ?28:
>> ?29: volume test-pfs-ro1-client-4
>> ?30: ? ? type protocol/client
>> ?31: ? ? option remote-host jc1letgfs7
>> ?32: ? ? option remote-subvolume /export/read-only/g01
>> ?33: ? ? option transport-type tcp
>> ?34: end-volume
>> ?35:
>> 36: volume test-pfs-ro1-client-5
>> ?37: ? ? type protocol/client
>> ?38: ? ? option remote-host jc1letgfs8
>> ?39: ? ? option remote-subvolume /export/read-only/g01
>> ?40: ? ? option transport-type tcp
>> ?41: end-volume
>> ?42:
>> ?43: volume test-pfs-ro1-client-6
>> ?44: ? ? type protocol/client
>> ?45: ? ? option remote-host jc1letgfs7
>> ?46: ? ? option remote-subvolume /export/read-only/g02
>> ?47: ? ? option transport-type tcp
>> ?48: end-volume
>> ?49:
>> ?50: volume test-pfs-ro1-client-7
>> ?51: ? ? type protocol/client
>> ?52: ? ? option remote-host jc1letgfs8
>> ?53: ? ? option remote-subvolume /export/read-only/g02
>> ?54: ? ? option transport-type tcp
>> ?55: end-volume
>> ?56:
>> ?57: volume test-pfs-ro1-replicate-0
>> ?58: ? ? type cluster/replicate
>> ?59: ? ? subvolumes test-pfs-ro1-client-0 test-pfs-ro1-client-1
>> ?60: end-volume
>> ?61:
>> ?62: volume test-pfs-ro1-replicate-1
>> ?63: ? ? type cluster/replicate
>> ?64: ? ? subvolumes test-pfs-ro1-client-2 test-pfs-ro1-client-3
>> ?65: end-volume
>> ?66:
>> ?67: volume test-pfs-ro1-replicate-2
>> ?68: ? ? type cluster/replicate
>> ?69: ? ? subvolumes test-pfs-ro1-client-4 test-pfs-ro1-client-5
>> ?70: end-volume
>> ?71:
>> ?72: volume test-pfs-ro1-replicate-3
>> ?73: ? ? type cluster/replicate
>> ?74: ? ? subvolumes test-pfs-ro1-client-6 test-pfs-ro1-client-7
>> ?75: end-volume
>> ?76:
>> ?77: volume test-pfs-ro1-dht
>> ?78: ? ? type cluster/distribute
>> ?79: ? ? subvolumes test-pfs-ro1-replicate-0 test-pfs-ro1-replicate-1 test-pfs-ro1-replicate-2 test-pfs-ro1-replicate-3
>> ?80: end-volume
>> ?81:
>> ?82: volume test-pfs-ro1-write-behind
>> ?83: ? ? type performance/write-behind
>> ?84: ? ? subvolumes test-pfs-ro1-dht
>> ?85: end-volume
>> ?86:
>> ?87: volume test-pfs-ro1-read-ahead
>> ?88: ? ? type performance/read-ahead
>> ?89: ? ? subvolumes test-pfs-ro1-write-behind
>> ?90: end-volume
>> ?91:
>> ?92: volume test-pfs-ro1-io-cache
>> ?93: ? ? type performance/io-cache
>> ?94: ? ? subvolumes test-pfs-ro1-read-ahead
>> ?95: end-volume
>> ?96:
>> ?97: volume test-pfs-ro1-quick-read
>> ?98: ? ? type performance/quick-read
>> ?99: ? ? subvolumes test-pfs-ro1-io-cache
>> 100: end-volume
>> 101:
>> 102: volume test-pfs-ro1-stat-prefetch
>> 103: ? ? type performance/stat-prefetch
>> 104: ? ? subvolumes test-pfs-ro1-quick-read
>> 105: end-volume
>> 106:
>> 107: volume test-pfs-ro1
>> 108: ? ? type debug/io-stats
>> 109: ? ? subvolumes test-pfs-ro1-stat-prefetch
>> 110: end-volume
>>
>> Any input would be greatly appreciated. I'm working beyond my deadline already, and I'm guessing that I'm not seeing the forest for the trees here.
>>
>> James Burnash, Unix Engineering
>>
>>
>> DISCLAIMER:
>> This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this in error, please immediately notify me and permanently delete the original and any copy of any e-mail and any printout thereof. E-mail transmission cannot be guaranteed to be secure or error-free. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission.
>> NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its discretion, monitor and review the content of all e-mail communications. http://www.knight.com
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>