Hi,
I came across a situation where there were a few IOs going to the subvolume which was not available. The situation happens due to the following.
During the remove brick commit the following things happen, the brick stop, volfile creation, and volfile change notification to client.
The order in which this happens is
1) the brick is stopped.
2) the volfile are created and then the notification go to the client.
This way there is a window between the brick stop and the clients being notified that the brick has been stopped.
The order in which this happens is
1) the brick is stopped.
2) the volfile are created and then the notification go to the client.
This way there is a window between the brick stop and the clients being notified that the brick has been stopped.
The brick is unavailable and the IO is coming to the stopped brick as the client is unaware of the volfile change for a while. And this results in an IO failure.
So I feel its better to do it in the following order:
1) create the volfile.
2) notify the client.
3) stop the brick.
1) create the volfile.
2) notify the client.
3) stop the brick.
This way the clients are notified and the IO starts going to the right subvol and the brick is available till then and as the brick is stopped after this the condition is resolved.
As this change is on the basic functionality, I thought of bringing it up here to everyones notice.
If you find anything that could break because of this change, or feel if there is a better way to handle this, Do let me know.
Thanks to Du, Atin, Kaushal and Nithya for helping me with this.
Regards,
Hari.
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel