Poor error handling in glusterd-op-sm

Atin Mukherjee <amukherj@xxxxxxxxxx> · Mon, 29 Sep 2014 17:36:19 +0530

Folks,

While I was debugging the stale mgmt v3 lock issues surfaced from
different test cases of rebalance (Mainly RHSC testing & BVT), I figured
out few buggy places which are/might be causing this problem, some of
them are listed below:

1. During locking phase, if we somehow fail to get back the locking
response from other nodes (probably the time out) we never release the
lock taken on the volume.

2. Locking/Brick op code doesn't have error handling code, it never
injects any failure event, so state machine remains in kind of
in-complete state.

3. IMO, no more than one thread should be in SM transaction processing
mode, but looking at the code I feel this is not 100% safe.

Looking at the above issues, I am just wondering is it worth spending
effort on fixing them or the safest option would be to move the
rebalance code into sync-op framework.

Your feedback will be appreciated.

Regards,
Atin
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-devel