--- On Wed, 12/31/08, Raghavendra G <raghavendra.hg@xxxxxxxxx> wrote: > On Wed, Dec 31, 2008 at 9:03 AM, Martin Fick > <mogulguy@xxxxxxxxx> wrote: > > The simplest case seems to be # 1, simply wait for the connection to > > reestablish itself and retry to submit the protocol to the wire. I hacked a > > simple implementation of this (looping in protocol_client_xfer until the > > connection is reestablished without holding the lock) which seems to work, > > but I have no clue if it is correct. ;) I will attach it below. > > > blocking the protocol_client_xfer till the server comes up is not good > always. It may not make any difference in a simple client/server setup. But > in a setup consisting of cluster translators, say afr, this would lead to > glusterfs getting blocked on trying to send requests to the server which is > down, though the request can be fulfilled from the other server(s) which > is(are) up. Good point, if I understand correctly what you are saying, AFR does not process requests to subvolumes in parallel and therefore would not know when to consider a subvolume "down" and move onto the next one? Do you forsee any other problems if this blocking behavior were optional and simply not used with AFR? Thanks for your comments, -Martin