concurrent "gluster volume status" crashes the command (v3.4 and v3.7)

Engelmann Florian <florian.engelmann@xxxxxxxxxxxx> · Tue, 10 Nov 2015 11:57:34 +0000

Dear list,

concurrent running "gluster volume status" on all 3 GlusterFS Nodes (actually those are LXC) somehow crashes the command. Two nodes reply "Another transaction is in progress. Please try again after sometime." and on the 3rd node the command hangs forever. Stopping the hanging command and running it again results also in "Another transaction is in progress. Please try again after sometime." on that machine.

strace exits like:

[...]
connect(7, {sa_family=AF_LOCAL, sun_path="/var/run/gluster/quotad.socket"}, 110) = -1 ENOENT (No such file or directory)
fcntl(7, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(7, F_SETFL, O_RDWR|O_NONBLOCK)    = 0
epoll_ctl(3, EPOLL_CTL_ADD, 7, {EPOLLIN|EPOLLPRI|EPOLLOUT|EPOLLONESHOT, {u32=1, u64=4294967297}}) = 0
pipe([8, 9])                            = 0
fcntl(9, F_SETFD, FD_CLOEXEC)           = 0
pipe([10, 11])                          = 0
fcntl(10, F_GETFL)                      = 0 (flags O_RDONLY)
fstat(10, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f67780e5000
lseek(10, 0, SEEK_CUR)                  = -1 ESPIPE (Illegal seek)
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f67780d9a50) = 28493
close(-1)                               = -1 EBADF (Bad file descriptor)
close(11)                               = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
close(9)                                = 0
read(8, "", 4)                          = 0
close(8)                                = 0
read(10, "gsyncd.py 0.0.1\n", 4096)     = 16
wait4(28493, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 28493
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=28493, si_status=0, si_utime=5, si_stime=1} ---
close(10)                               = 0
munmap(0x7f67780e5000, 4096)            = 0
close(-1)                               = -1 EBADF (Bad file descriptor)
close(-2)                               = -1 EBADF (Bad file descriptor)
close(-1)                               = -1 EBADF (Bad file descriptor)
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6773545000
mprotect(0x7f6773545000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f6773d44f70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6773d459d0, tls=0x7f6773d45700, child_tidptr=0x7f6773d459d0) = 28496
mmap(NULL, 8392704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f6772d44000
mprotect(0x7f6772d44000, 4096, PROT_NONE) = 0
clone(child_stack=0x7f6773543f70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f67735449d0, tls=0x7f6773544700, child_tidptr=0x7f67735449d0) = 28497
futex(0x7f67735449d0, FUTEX_WAIT, 28497, NULLAnother transaction is in progress. Please try again after sometime.
 <unfinished ...>
+++ exited with 1 +++

I  had to stop all volumes and restart glusterd to solve that problem.

Host OS: Ubuntu 14.04 LTS
LXC OS:  Ubuntu 14.04 LTS

We've got this issue with 3.4.2 (Ubuntu official) and upgraded to 3.7.5 (Launchpad) to check if the problem still exists. Still unsolved. Any ideas?

Thank you for your help,
Florian
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users