--- On Thu, 1/1/09, Krishna Srinivas <krishna@xxxxxxxxxxxxx> wrote: > > <mogulguy@xxxxxxxxx> wrote: > > I am a bit curious about the new HA translator and how > > it it supposed to work? I have looked through the code a > > bit and this is my naive interpretation of the way it is > > designed: > > > > It appears that the HA translator keeps track of its > > subvolumes and whether they are active or not. When > > attempting to dispatch a request, it picks a currently > > active subvolume or fails if none are currently active. It > > appears that once it has chosen an active subvolume for a > > request, it can no longer fail over to another subvolume for > > that particular request, is this true? If so, then it is > > possible (perhaps likely) that during failovers some > > requests will fail before failover happens even if certain > > subvolumes never go down? :( Is this correct or am I > > missing something? > > No. Requests are retried on the next subvolume if the > current one goes > down during the operation, so it should work fine. Hmm, I don't see this looping on failure in the code, but my understanding of the translator design is fairly minimal. I will have to look harder. I was hoping to be able to modify the subvolume looping to be able to loop back upon itself indefinitely if all the subvolumes failed. If this could be done, it seems like this would be an easy way to achieve NFS style blocking when the server is down (see my other thread on this), by simply using the HA translator with only one subvolume. Also, how about failure due to replies that do not return because the link is down? Are the requests saved after they are sent until the reply arrives so that it can be resent on the other link if the original link successfully sends the request, but goes down afterwards and cannot receive the reply? Thanks and Happy New Year, -Martin