Hello. My name is Víctor and I would like to ask about some test I have been doing with glusterfs. We are a bio-search company, and we are thinking on using gluserfs to develop one of our projects. I am doing the test with two servers and a client, using an AFR cluster mode parsed on the client side. The fact is I have been reading on the glusterfs documentation that this sort of implementation would included high availability in case one server goes down. My files are: *Fichero de configuración del cliente (CENTRAL). CLIENT* *volume sargasv0* type protocol/client option transport-type tcp/client option remote-host 192.168.1.60 option remote-port 6996 option remote-subvolume v0 *end-volume* *volume shedirv4* type protocol/client option transport-type tcp/client option remote-host 192.168.1.61 option remote-port 6996 option remote-subvolume v4 *end-volume* *volume mirror0* type cluster/afr subvolumes sargasv0 shedirv4 *end-volume* *Fichero de configuración del servidor (SARGAS) SERVER1* *volume v0* type storage/posix option directory /tmp/export0 *end-volume* *volume server* type protocol/server option transport-type tcp/server option listen-port 6996 option auth.ip.v0.allow * subvolumes v0 *end-volume* *Fichero de configuración del servidor (SHEDIR* ) *SERVER2* *volume v4* type storage/posix option directory /tmp/export4 *end-volume* *volume server* type protocol/server option transport-type tcp/server option listen-port 6996 option auth.ip.v4.allow * subvolumes v4 *end-volume* Well, I have done some tests playing a movie (*.avi) file installed over my glusterfs mounted directory with Totem Movie Player. Farther tests were done on VLC media player with identical results. I am running ubuntu 7.10. Once I have run the glusterfs infrastructure with the two servers and the client, I made a copy of the avi file from my home directory to the mounted glusterfs on the client. The file was copied correctly and the replication to servers was ok. I began the test on debug mode. When I plug off one of the servers I could keep on watching the video after a period of load balance to the remaining active server of about 20/30 seconds. Well this is high abailability, but when I plug again the server that previously I had desattached and plug off the other one, I obtained the following error: "could not read from resource", and the following lines on the debug's log. *2008-05-23 12:45:31 D [client-protocol.c:4750:client_protocol_reconnect] sargasv0: attempting reconnect * *2008-05-23 12:45:31 D [tcp-client.c:77:tcp_connect] sargasv0: socket fd = 6 * *2008-05-23 12:45:31 D [tcp-client.c:107:tcp_connect] sargasv0: finalized on port `1023' * *2008-05-23 12:45:31 D [common-utils.c:179:gf_resolve_ip] resolver: DNS cache not present, freshly probing hostname: 192.168.1.60 * *2008-05-23 12:45:31 D [common-utils.c:204:gf_resolve_ip] resolver: returning IP:192.168.1.60[0] for hostname: 192.168.1.60 * *2008-05-23 12:45:31 D [common-utils.c:212:gf_resolve_ip] resolver: flushing DNS cache * *2008-05-23 12:45:31 D [tcp-client.c:161:tcp_connect] sargasv0: connect on 6 in progress (non-blocking) * *2008-05-23 12:45:31 D [tcp-client.c:198:tcp_connect] sargasv0: connection on 6 still in progress - try later * *2008-05-23 12:45:35 W [client-protocol.c:205:call_bail] shedirv4: activating bail-out. pending frames = 1. last sent = 2008-05-23 12:44:52. last received = 2008-05-23 12:44:52 transport-timeout = 42 * *2008-05-23 12:45:35 C [client-protocol.c:212:call_bail] shedirv4: bailing transport * *2008-05-23 12:45:35 D [tcp.c:137:cont_hand] tcp: forcing poll/read/write to break on blocked socket (if any) * *2008-05-23 12:45:35 W [client-protocol.c:4777:client_protocol_cleanup] shedirv4: cleaning up state in transport object 0x808bd90 * *2008-05-23 12:45:35 E [client-protocol.c:4827:client_protocol_cleanup] shedirv4: forced unwinding frame type(1) op(13) reply=@0xb6a00468 * *2008-05-23 12:45:35 E [client-protocol.c:3193:client_readv_cbk] shedirv4: no proper reply from server, returning ENOTCONN * *2008-05-23 12:45:35 D [afr.c:2248:afr_readv_cbk] mirror0: reading from child 2 * *2008-05-23 12:45:35 E [afr.c:2262:afr_readv_cbk] mirror0: (path=/dc4.avi child=shedirv4) op_ret=-1 op_errno=107 * *2008-05-23 12:45:35 E [fuse-bridge.c:1551:fuse_readv_cbk] glusterfs-fuse: 182438: READ => -1 (107) * *2008-05-23 12:45:35 D [tcp.c:87:tcp_disconnect] shedirv4: connection disconnected * *2008-05-23 12:45:35 D [afr.c:5939:notify] mirror0: GF_EVENT_CHILD_DOWN from shedirv4 * *2008-05-23 12:45:35 D [fuse-bridge.c:1577:fuse_readv] glusterfs-fuse: 182439: READ (0xb6c01420, size=4096, offset=172892160) * *2008-05-23 12:45:35 E [fuse-bridge.c:1551:fuse_readv_cbk] glusterfs-fuse: 182439: READ => -1 (107) * *2008-05-23 12:45:35 D [fuse-bridge.c:1577:fuse_readv] glusterfs-fuse: 182440: READ (0xb6c01420, size=4096, offset=172892160) * *2008-05-23 12:45:35 E [fuse-bridge.c:1551:fuse_readv_cbk] glusterfs-fuse: 182440: READ => -1 (107) * *... * In this case, I had to close the file and play it again. Then glusterfs looked for the file on the active server and run it without problems. But, If you do the test again, pluging the server that was previously unplugged and plugging off the one that was active the same error comes out and the film is stopped again. Therefor, the very first time one server is down, is possible to maintain the file open and continue watching the video, but second and following attemps would became on read error and it is necessary to re-open the file again... Is there a way of avoiding this read-error in order to maintain my file opened and continue watching the movie after the load balance to the active server has happened from a second time? Thank you for your help.