Re: Server Side AFR gets transport endpoint is not connected

"Krishna Srinivas" <krishna@xxxxxxxxxxxxx> · Sat, 30 Aug 2008 22:18:28 +0530

James,
It is planned for the later releases of 1.4.
Let us wait for Avati's reply regarding the timeframe.
Krishna

On Thu, Aug 28, 2008 at 7:03 PM, James E Warner <jwarner6@xxxxxxx> wrote:
> Thanks for the prompt reply.  One final question.... is the HA translator
> still planned for the upcoming 1.4 release and if not do you have a rough
> idea of what release it is going into?
>
> Thanks Again,
>
> James Warner
> Computer Sciences Corporation
> Registered Office: 3170 Fairview Park Drive, Falls Church, Virginia 22042,
> USA
> Registered in Nevada, USA No: C-489-59
>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> This is a PRIVATE message. If you are not the intended recipient, please
> delete without copying and kindly advise us by e-mail of the mistake in
> delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind CSC to
> any order or other contract unless pursuant to explicit written agreement
> or government initiative expressly permitting the use of e-mail for such
> purpose.
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
>
>             "Krishna
>             Srinivas"
>             <krishna@zresearc                                          To
>             h.com>                    James E Warner/DEF/CSC@CSC
>             Sent by:                                                   cc
>             krishna.srinivas@         gluster-devel@xxxxxxxxxx
>             gmail.com                                             Subject
>                                       Re: Server Side AFR
>                                       gets transport endpoint is not
>             08/28/2008 01:03          connected
>             AM
>
>
>
>
>
>
>
>
>
> On Thu, Aug 28, 2008 at 12:45 AM, James E Warner <jwarner6@xxxxxxx> wrote:
>>
>> Hi,
>>
>> I'm currently testing gluster to see if I can make it work for our HA
>> filesystem needs.  And in initial testing things seem to be very good
>> especially with client side AFR performing replication to our server
> nodes.
>> However, we would like to keep our client network free of replication
>> traffic so I set up server side afr with three storage bricks replicating
>> data between themselves and round robin DNS for the node failover.  The
>> round robin dns is working and the failover between the nodes is kind of
>> working, but if I pull the network cable on the currently active server
>> (the host that the glusterfs client is connected to) the next filesystem
>> operation (such as ls /mnt/glusterfs) fails with a "transport endpoint is
>> not connected" error.  Similarly, if I have a large copy operation in
>> progress the copy will exit with a failure. All of the operations after
>> that work fine and netstat shows that the node has failed over to the
> next
>> server in the list, but by that point I the current file system operation
>> has failed.  Anyway, this leads me to a few questions:
>>
>> 0.  Do my config files look OK or does it look like I've configured this
>> thing incorrectly? :)
>> 1.  Is this the expected behavior or is this a bug?  From reading the
>> mailing list I had the impression that on failure the operation would be
>> tried on the remaining ip's that were cached in the clients list, so I
> was
>> surprised that the operation failed and I think that it is probably a
> bug,
>> but I could see an argument for how this might be considered normal
>> operation.
>
> That is the expected behavior.
>
>>
>> 2.  If this is expected behavior is there any plan to change the behavior
>> in the future or is server side AFR always expected to work this way?
> I've
>> seen references to round robin dns being an interim measure on the
> mailing
>> list, so I'm not sure if there is another translator in the works or not.
>> If there is something in the works is that available in the current
>> glusterfs 1.4 snapshot releases or is that planned for a much later
>> version?
>
> Yes we plan to bring in a HA translator which will make this work fine.
>
>>
>> 3.  Can you think of any option that I might have missed that would
> correct
>> the problem and allow the currently running file operation to succeed
>> during a failover?
>>
>> 4.  Once again if this is as designed can you explain the reason that it
>> works this way?  As I said I really expected it to transparently failover
>> in much the same way that client side afr seems to, so I was surprised
> that
>> it didn't.
>
> If AFR is on client side, it will maintain connections to its
> subvolumes separately.
> So if one node fails, it will still have connection to other subvols.
> However if AFR
> is on server side and the server goes down, it can not do anything about
> it.
> Now if we bring HA xlator into picture, it sits on the client and it
> can take care
> of seamless failure transition when the connection fails.
>
>>
>> Since I hope that this is a bug, the configuration files and the relevant
>> sections of the client log are below.  I have used this configuration on
>> the gluster 1.3.11 version and the latest snapshot from August 27, 2008.
>>
>> Client Log Snippet:
>> ================
>>
>> 2008-08-27 12:53:34 D [fuse-bridge.c:839:fuse_err_cbk] glusterfs-fuse:
> 62:
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>