On Wed, 2016-04-20 at 16:24 +0530, Atin Mukherjee wrote: > I should have said the regression link is irrelevant here. Try > running > this test on your local setup multiple times on mainline. I do > believe > you should see the crash. > I could see coredump on running trash.t multiple times in a while loop. Info from coredump: Core was generated by `/usr/local/sbin/glusterfs -s localhost -- volfile-id gluster/glustershd -p /var/'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000040bd31 in glusterfs_handle_translator_op (req=0x7feab8001dec) at glusterfsd-mgmt.c:590 590 any = active->first; [Current thread is 1 (Thread 0x7feac1657700 (LWP 12050))] (gdb) l 585 goto out; 586 } 587 588 ctx = glusterfsd_ctx; 589 active = ctx->active; 590 any = active->first; 591 input = dict_new (); 592 ret = dict_unserialize (xlator_req.input.input_val, 593 xlator_req.input.input_len, 594 &input); (gdb) p ctx $1 = (glusterfs_ctx_t *) 0x7fa010 (gdb) p ctx->active $2 = (glusterfs_graph_t *) 0x0 (gdb) p *req $1 = {trans = 0x7feab8000e20, svc = 0x83ca50, prog = 0x874810, xid = 1, prognum = 4867634, progver = 2, procnum = 3, type = 0, uid = 0, gid = 0, pid = 0, lk_owner = {len = 4, data = '\000' <repeats 1023 times>}, gfs_id = 0, auxgids = 0x7feab800223c, auxgidsmall = {0 <repeats 128 times>}, auxgidlarge = 0x0, auxgidcount = 0, msg = {{iov_base = 0x7feacc253840, iov_len = 488}, {iov_base = 0x0, iov_len = 0} <repeats 15 times>}, count = 1, iobref = 0x7feab8000c40, rpc_status = 0, rpc_err = 0, auth_err = 0, txlist = {next = 0x7feab800256c, prev = 0x7feab800256c}, payloadsize = 0, cred = {flavour = 390039, datalen = 24, authdata = '\000' <repeats 19 times>, "\004", '\000' <repeats 379 times>}, verf = {flavour = 0, datalen = 0, authdata = '\000' <repeats 399 times>}, synctask = _gf_true, private = 0x0, trans_private = 0x0, hdr_iobuf = 0x82b038, reply = 0x0} (gdb) p req->procnum $3 = 3 <== GLUSTERD_BRICK_XLATOR_OP (gdb) t a a bt Thread 6 (Thread 0x7feabf178700 (LWP 12055)): #0 0x00007feaca522043 in epoll_wait () at ../sysdeps/unix/syscall- template.S:84 #1 0x00007feacbe5076f in event_dispatch_epoll_worker (data=0x878130) at event-epoll.c:664 #2 0x00007feacac4560a in start_thread (arg=0x7feabf178700) at pthread_create.c:334 #3 0x00007feaca521a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 5 (Thread 0x7feac2659700 (LWP 12048)): #0 do_sigwait (sig=0x7feac2658e3c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:64 #1 __sigwait (set=<optimized out>, sig=0x7feac2658e3c) at ../sysdeps/unix/sysv/linux/sigwait.c:96 #2 0x0000000000409895 in glusterfs_sigwaiter (arg=0x7ffe3debbf00) at glusterfsd.c:2032 #3 0x00007feacac4560a in start_thread (arg=0x7feac2659700) at pthread_create.c:334 #4 0x00007feaca521a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 4 (Thread 0x7feacc2b4780 (LWP 12046)): #0 0x00007feacac466ad in pthread_join (threadid=140646205064960, thread_return=0x0) at pthread_join.c:90 #1 0x00007feacbe509bb in event_dispatch_epoll (event_pool=0x830b80) at event-epoll.c:758 #2 0x00007feacbe17a91 in event_dispatch (event_pool=0x830b80) at event.c:124 #3 0x000000000040a3c8 in main (argc=13, argv=0x7ffe3debd0f8) at glusterfsd.c:2376 Thread 3 (Thread 0x7feac2e5a700 (LWP 12047)): #0 0x00007feacac4e27d in nanosleep () at ../sysdeps/unix/syscall- template.S:84 #1 0x00007feacbdfc152 in gf_timer_proc (ctx=0x7fa010) at timer.c:188 #2 0x00007feacac4560a in start_thread (arg=0x7feac2e5a700) at pthread_create.c:334 #3 0x00007feaca521a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 2 (Thread 0x7feac1e58700 (LWP 12049)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225 #1 0x00007feacbe2d73d in syncenv_task (proc=0x838310) at syncop.c:603 #2 0x00007feacbe2d9dd in syncenv_processor (thdata=0x838310) at syncop.c:695 #3 0x00007feacac4560a in start_thread (arg=0x7feac1e58700) at pthread_create.c:334 #4 0x00007feaca521a4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Thread 1 (Thread 0x7feac1657700 (LWP 12050)): #0 0x000000000040bd31 in glusterfs_handle_translator_op (req=0x7feab8001dec) at glusterfsd-mgmt.c:590 #1 0x00007feacbe2cf04 in synctask_wrap (old_task=0x7feab80031c0) at syncop.c:375 #2 0x00007feaca467f30 in ?? () from /lib64/libc.so.6 #3 0x0000000000000000 in ?? () Looking at the core, crash was seen from glusterfs_handle_translator_op() routine while doing a 'volume heal' command. I could then easily create a small test case to re-produce the issue. Please find the attachment for the same. --Anoop C S.
Attachment:
core-reprod.t
Description: Perl program
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel