Re: gluster volume stop and the regressions

Milind Changire <mchangir@xxxxxxxxxx> · Wed, 14 Feb 2018 12:30:28 +0530

The volume stop, in brick-mux mode reveals a race with my patch [1]
Although this behavior is 100% reproducible with my patch, this, by no means, implies that my patch is buggy.

In
 brick-mux mode, during volume stop, when glusterd sends a brick-detach 
message to the brick process for the last brick, the brick process 
responds back to glusterd with an acknowledgment and then kills itself 
with a SIGTERM signal. All this sounds fine. However, somehow, the 
response doesn't reach glusterd and instead a socket disconnect 
notification reaches glusterd before the response. This causes glusterd 
to presume that something has gone wrong during volume stop and glusterd
 then fails the volume stop operation causing the test to fail.

This race is reproducible by running the test tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch [1]

[1] https://review.gluster.org/19308

On Thu, Feb 1, 2018 at 9:54 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
I don't think that's the right way. Ideally the test shouldn't be attempting to stop a volume if rebalance session is in progress. If we do see such a situation even with we check for rebalance status and wait till it finishes for 30 secs and still volume stop fails with rebalance session in progress error, that means either (a) rebalance session took more than the timeout which has been passed to EXPECT_WITHIN or (b) there's a bug in the code.

On Thu, Feb 1, 2018 at 9:46 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
If a *volume stop* fails at a user's production site with a reason like *rebalance session is active* then the admin will wait for the session to complete and then reissue a *volume stop*;

So, in essence, the failed volume stop is not fatal; for the regression tests, I would like to propose to change a single volume stop to *EXPECT_WITHIN 30* so that a if a volume cannot be stopped even after 30 seconds, then it could be termed fatal in the regressions scenario

Any comments about the proposal ?

-- 
Milind

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-devel

-- 
Milind

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel