Re: Core generated by trash.t

Anoop C S <anoopcs@xxxxxxxxxx> · Fri, 22 Apr 2016 14:14:28 +0530

On Wed, 2016-04-20 at 16:24 +0530, Atin Mukherjee wrote:
> I should have said the regression link is irrelevant here. Try
> running
> this test on your local setup multiple times on mainline. I do
> believe
> you should see the crash.
> 

I could see coredump on running trash.t multiple times in a while loop.
Info from coredump:

Core was generated by `/usr/local/sbin/glusterfs -s localhost --
volfile-id gluster/glustershd -p /var/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000000000040bd31 in glusterfs_handle_translator_op
(req=0x7feab8001dec) at glusterfsd-mgmt.c:590
590	        any = active->first;
[Current thread is 1 (Thread 0x7feac1657700 (LWP 12050))]
(gdb) l
585	                goto out;
586	        }
587	
588	        ctx = glusterfsd_ctx;
589	        active = ctx->active;
590	        any = active->first;
591	        input = dict_new ();
592	        ret = dict_unserialize (xlator_req.input.input_val,
593	                                xlator_req.input.input_len,
594	                                &input);
(gdb) p ctx
$1 = (glusterfs_ctx_t *) 0x7fa010
(gdb) p ctx->active
$2 = (glusterfs_graph_t *) 0x0
(gdb) p *req
$1 = {trans = 0x7feab8000e20, svc = 0x83ca50, prog = 0x874810, xid = 1,
prognum = 4867634, progver = 2, procnum = 3, type = 0, uid = 0, gid =
0, pid = 0, lk_owner = {len = 4, 
    data = '\000' <repeats 1023 times>}, gfs_id = 0, auxgids =
0x7feab800223c, auxgidsmall = {0 <repeats 128 times>}, auxgidlarge =
0x0, auxgidcount = 0, msg = {{iov_base = 0x7feacc253840, 
      iov_len = 488}, {iov_base = 0x0, iov_len = 0} <repeats 15
times>}, count = 1, iobref = 0x7feab8000c40, rpc_status = 0, rpc_err =
0, auth_err = 0, txlist = {next = 0x7feab800256c, 
    prev = 0x7feab800256c}, payloadsize = 0, cred = {flavour = 390039,
datalen = 24, authdata = '\000' <repeats 19 times>, "\004", '\000'
<repeats 379 times>}, verf = {flavour = 0, 
    datalen = 0, authdata = '\000' <repeats 399 times>}, synctask =
_gf_true, private = 0x0, trans_private = 0x0, hdr_iobuf = 0x82b038,
reply = 0x0}
(gdb) p req->procnum
$3 = 3 <== GLUSTERD_BRICK_XLATOR_OP
(gdb) t a a bt

Thread 6 (Thread 0x7feabf178700 (LWP 12055)):
#0  0x00007feaca522043 in epoll_wait () at ../sysdeps/unix/syscall-
template.S:84
#1  0x00007feacbe5076f in event_dispatch_epoll_worker (data=0x878130)
at event-epoll.c:664
#2  0x00007feacac4560a in start_thread (arg=0x7feabf178700) at
pthread_create.c:334
#3  0x00007feaca521a4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7feac2659700 (LWP 12048)):
#0  do_sigwait (sig=0x7feac2658e3c, set=<optimized out>) at
../sysdeps/unix/sysv/linux/sigwait.c:64
#1  __sigwait (set=<optimized out>, sig=0x7feac2658e3c) at
../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x0000000000409895 in glusterfs_sigwaiter (arg=0x7ffe3debbf00) at
glusterfsd.c:2032
#3  0x00007feacac4560a in start_thread (arg=0x7feac2659700) at
pthread_create.c:334
#4  0x00007feaca521a4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7feacc2b4780 (LWP 12046)):
#0  0x00007feacac466ad in pthread_join (threadid=140646205064960,
thread_return=0x0) at pthread_join.c:90
#1  0x00007feacbe509bb in event_dispatch_epoll (event_pool=0x830b80) at
event-epoll.c:758
#2  0x00007feacbe17a91 in event_dispatch (event_pool=0x830b80) at
event.c:124
#3  0x000000000040a3c8 in main (argc=13, argv=0x7ffe3debd0f8) at
glusterfsd.c:2376

Thread 3 (Thread 0x7feac2e5a700 (LWP 12047)):
#0  0x00007feacac4e27d in nanosleep () at ../sysdeps/unix/syscall-
template.S:84
#1  0x00007feacbdfc152 in gf_timer_proc (ctx=0x7fa010) at timer.c:188
#2  0x00007feacac4560a in start_thread (arg=0x7feac2e5a700) at
pthread_create.c:334
#3  0x00007feaca521a4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7feac1e58700 (LWP 12049)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x00007feacbe2d73d in syncenv_task (proc=0x838310) at syncop.c:603
#2  0x00007feacbe2d9dd in syncenv_processor (thdata=0x838310) at
syncop.c:695
#3  0x00007feacac4560a in start_thread (arg=0x7feac1e58700) at
pthread_create.c:334
#4  0x00007feaca521a4d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7feac1657700 (LWP 12050)):
#0  0x000000000040bd31 in glusterfs_handle_translator_op
(req=0x7feab8001dec) at glusterfsd-mgmt.c:590
#1  0x00007feacbe2cf04 in synctask_wrap (old_task=0x7feab80031c0) at
syncop.c:375
#2  0x00007feaca467f30 in ?? () from /lib64/libc.so.6
#3  0x0000000000000000 in ?? ()

Looking at the core, crash was seen from
glusterfs_handle_translator_op() routine while doing a 'volume heal'
command. I could then easily create a small test case to re-produce the
issue. Please find the attachment for the same.

--Anoop C S.
Attachment:
core-reprod.t

Description: Perl program
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel