Krishnan Parthasarathi <kparthas@xxxxxxxxxx> wrote: > The scheduling of a paused task happens when the epoll thread receives a > POLLIN event along with the response from the remote endpoint. This is > contingent on the fact that the call back must issue a synctask_wake, > which will trigger the resumption of the task (in one of the threads from > the syncenv). In summary, the call back code triggers the scheduling back > of the paused task. Right, this seems to work. I found the __wake() call at the end of _gd_syncop_brick_op_cbk() and it is executed. The problem is therefore not there. I tried running the test setps one by one. The offending command is "gluster volume heal $V0 info", hence I run it between each step. It works at the beginning, it works if I kill 3 out ouf 6 bricks, and it hangs after I created files in the volume (with 3 out of 6 bricks down). And at that time, the bricks that are still up show this in the logs: [2014-09-11 17:47:31.452067] I [server.c:518:server_rpc_notify] 0-patchy-server: disconnecting connection from netbsd0.cloud.gluster.org-24431-2014/09/11-17:40:47:719843-patchy-client -1-0-0 [2014-09-11 17:47:31.452142] I [server-helpers.c:290:do_fd_cleanup] 0-patchy-server: fd cleanup on /a/a/a/a/a/a/a/a/a/a [2014-09-11 17:47:31.452689] I [client_t.c:417:gf_client_unref] 0-patchy-server: Shutting down connection netbsd0.cloud.gluster.org-24431-2014/09/11-17:40:47:719843-patchy-client -1-0-0 [2014-09-11 17:47:31.455145] I [server.c:518:server_rpc_notify] 0-patchy-server: disconnecting connection from netbsd0.cloud.gluster.org-3612-2014/09/11-17:40:28:979958-patchy-client- 1-0-0 [2014-09-11 17:47:31.455172] I [client_t.c:417:gf_client_unref] 0-patchy-server: Shutting down connection netbsd0.cloud.gluster.org-3612-2014/09/11-17:40:28:979958-patchy-client- 1-0-0 [2014-09-11 17:47:31.455208] I [server.c:518:server_rpc_notify] 0-patchy-server: disconnecting connection from netbsd0.cloud.gluster.org-26218-2014/09/11-17:40:28:900316-patchy-client -1-0-0 [2014-09-11 17:47:31.455230] I [client_t.c:417:gf_client_unref] 0-patchy-server: Shutting down connection netbsd0.cloud.gluster.org-26218-2014/09/11-17:40:28:900316-patchy-client -1-0-0 If I understood correctly, gluster volume heal info causes glusterd to send requests to bricks that are alive. If they go offline at that time it may explain why the command hangs. What is the correct behavior here? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@xxxxxxxxxx _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel