----- Original Message ----- > From: "Avra Sengupta" <asengupt@xxxxxxxxxx> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Monday, February 29, 2016 5:20:53 PM > Subject: Gap in protocol client-server handshake > > Hi, > > Currently on a successful connection between protocol server and client, > the protocol client initiates a CHILD_UP event in the client stack. At > this point in time, only the connection between server and client is > established, and there is no guarantee that the server side stack is > ready to serve requests. It works fine now, as most server side > translators are not dependent on any other factors, before being able to > serve requests today and hence they are up by the time the client stack > translators receive the CHILD_UP (initiated by client handshake). > > The gap here is exposed when certain server side translators like > NSR-Server for example, have a couple of protocol clients as their > child(connecting them to other bricks), and they can't really serve > requests till a quorum of their children are up. Hence these translators > *should* defer sending CHILD_UP till they have enough children up, and > the same CHILD_UP event needs to be propagated to the client stack > translators. Yes. We have seen this problem (mostly in the form of crashes of brick process). > > I have sent a patch(http://review.gluster.org/#/c/13549/) addressing > this, where we maintain a child_up variable in both the protocol client > and protocol server translators. The protocol server updates this value > based on the CHILD_UP and CHILD_DOWN events it receives from the > translators below it. On receiving such an event it forwards that event > to the client. The protocol client on receiving such an event forwards > it up the client stack, thereby letting the client translators correctly > know that the server is up and ready to serve. > > The clients connecting later(long after a server has initialized and > processed it's CHILD_UP events), receive a child_up status as part of > the handshake, and based on the status of the server's child_up, either > propagate a CHILD_UP event or defer it. > > Please have a look at the patch, and kindly state if it you have any > concerns or you foresee any scenarios of interest which we might have > missed. Thanks for the patch. I'll review it. > > Regards, > Avra > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel