Re: gluster volume stop and the regressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The volume stop, in brick-mux mode reveals a race with my patch [1]
Although this behavior is 100% reproducible with my patch, this, by no means, implies that my patch is buggy.

In brick-mux mode, during volume stop, when glusterd sends a brick-detach message to the brick process for the last brick, the brick process responds back to glusterd with an acknowledgment and then kills itself with a SIGTERM signal. All this sounds fine. However, somehow, the response doesn't reach glusterd and instead a socket disconnect notification reaches glusterd before the response. This causes glusterd to presume that something has gone wrong during volume stop and glusterd then fails the volume stop operation causing the test to fail.

This race is reproducible by running the test tests/basic/distribute/rebal-all-nodes-migrate.t in brick-mux mode for my patch [1]

[1] https://review.gluster.org/19308


On Thu, Feb 1, 2018 at 9:54 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
I don't think that's the right way. Ideally the test shouldn't be attempting to stop a volume if rebalance session is in progress. If we do see such a situation even with we check for rebalance status and wait till it finishes for 30 secs and still volume stop fails with rebalance session in progress error, that means either (a) rebalance session took more than the timeout which has been passed to EXPECT_WITHIN or (b) there's a bug in the code.

On Thu, Feb 1, 2018 at 9:46 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
If a *volume stop* fails at a user's production site with a reason like *rebalance session is active* then the admin will wait for the session to complete and then reissue a *volume stop*;

So, in essence, the failed volume stop is not fatal; for the regression tests, I would like to propose to change a single volume stop to *EXPECT_WITHIN 30* so that a if a volume cannot be stopped even after 30 seconds, then it could be termed fatal in the regressions scenario

Any comments about the proposal ?

--
Milind


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel




--
Milind

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux