Hello, anyone has any comments to the issues I described below? Any feedback would be more than welcome. Thanks On 16.4.2011. 21:01, Emir Imamagic wrote: > Hello, > > I am trying to find precise definition of gluster native client behavior > in case of distributed volume node failure. Some info is provided in FAQ: > > http://www.gluster.com/community/documentation/index.php/GlusterFS_Technical_FAQ#What_happens_if_a_GlusterFS_brick_crashes.3F > > but it doesn't provide details. > The other info I managed to find is this stale document: > > http://www.gluster.com/community/documentation/index.php/Understanding_DHT_Translator > > Document says that files on the failed node will not be visible to > client. However, behavior of opened file handles is not described. > > I did couple of simple tests with cp and sha1 commands in order to see > what what happens. Server configuration: > Volume Name: test > Type: Distribute > Status: Started > Number of Bricks: 2 > Transport-type: tcp > Bricks: > Brick1: gluster1:/data > Brick2: gluster2:/data > Options Reconfigured: > performance.stat-prefetch: off > performance.write-behind-window-size: 4MB > performance.io-thread-count: 8 > On client side I use default mount without any additional options. > > *File read*: Both cp and sha1 seem to read to the point when node fails > and then exit without error. In case of sha1sum it reports incorrect > hash and in case of cp it copies part of the file. In Gluster client > logs I see errors indicating node failure, but commands doesn't report > anything. > > *File write*: In case of write situation is slightly better as cp > reports that endpoint is not connected and then fails: > # cp testfile /gluster/; echo $? > cp: writing `testfile': Transport endpoint is not connected > cp: closing `testfile': Transport endpoint is not connected > 1 > > Another interesting detail is that in client log I see that file gets > reopened when the storage node comes back online: > [2011-04-16 14:03:04.909540] I > [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on > /testfile succeeded (remote-fd = 0) > [2011-04-16 14:03:04.909782] I > [client-handshake.c:407:client3_1_reopen_cbk] test-client-1: reopen on > /testfile succeeded (remote-fd = 1) > However, command has already finished. What is the purpose of this reopen? > > Is this expected behavior? Could you please provide pointers to > documentation if such exists? > > Is it possible to tune this behavior to be more NFS alike, i.e. put > processes in IO wait until the node comes back? > > Thanks in advance -- Emir Imamagic www.srce.hr -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 2853 bytes Desc: S/MIME Cryptographic Signature URL: <http://gluster.org/pipermail/gluster-users/attachments/20110512/4d109c71/attachment.bin>