Re: Core generated by trash.t

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Fri, 22 Apr 2016 05:18:24 -0400 (EDT)

+Krutika

----- Original Message -----
> From: "Anoop C S" <anoopcs@xxxxxxxxxx>
> To: "Atin Mukherjee" <amukherj@xxxxxxxxxx>
> Cc: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Ravishankar N" <ravishankar@xxxxxxxxxx>, "Anuradha Talur"
> <atalur@xxxxxxxxxx>, gluster-devel@xxxxxxxxxxx
> Sent: Friday, April 22, 2016 2:14:28 PM
> Subject: Re:  Core generated by trash.t
> 
> On Wed, 2016-04-20 at 16:24 +0530, Atin Mukherjee wrote:
> > I should have said the regression link is irrelevant here. Try
> > running
> > this test on your local setup multiple times on mainline. I do
> > believe
> > you should see the crash.
> > 
> 
> I could see coredump on running trash.t multiple times in a while loop.
> Info from coredump:
> 
> Core was generated by `/usr/local/sbin/glusterfs -s localhost --
> volfile-id gluster/glustershd -p /var/'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  0x000000000040bd31 in glusterfs_handle_translator_op
> (req=0x7feab8001dec) at glusterfsd-mgmt.c:590
> 590	        any = active->first;
> [Current thread is 1 (Thread 0x7feac1657700 (LWP 12050))]
> (gdb) l
> 585	                goto out;
> 586	        }
> 587
> 588	        ctx = glusterfsd_ctx;
> 589	        active = ctx->active;
> 590	        any = active->first;
> 591	        input = dict_new ();
> 592	        ret = dict_unserialize (xlator_req.input.input_val,
> 593	                                xlator_req.input.input_len,
> 594	                                &input);
> (gdb) p ctx
> $1 = (glusterfs_ctx_t *) 0x7fa010
> (gdb) p ctx->active
> $2 = (glusterfs_graph_t *) 0x0

I think this is because the request came to shd even before the graph is intialized? Thanks for the test case. I will take a look at this.

Pranith
> (gdb) p *req
> $1 = {trans = 0x7feab8000e20, svc = 0x83ca50, prog = 0x874810, xid = 1,
> prognum = 4867634, progver = 2, procnum = 3, type = 0, uid = 0, gid =
> 0, pid = 0, lk_owner = {len = 4,
>     data = '\000' <repeats 1023 times>}, gfs_id = 0, auxgids =
> 0x7feab800223c, auxgidsmall = {0 <repeats 128 times>}, auxgidlarge =
> 0x0, auxgidcount = 0, msg = {{iov_base = 0x7feacc253840,
>       iov_len = 488}, {iov_base = 0x0, iov_len = 0} <repeats 15
> times>}, count = 1, iobref = 0x7feab8000c40, rpc_status = 0, rpc_err =
> 0, auth_err = 0, txlist = {next = 0x7feab800256c,
>     prev = 0x7feab800256c}, payloadsize = 0, cred = {flavour = 390039,
> datalen = 24, authdata = '\000' <repeats 19 times>, "\004", '\000'
> <repeats 379 times>}, verf = {flavour = 0,
>     datalen = 0, authdata = '\000' <repeats 399 times>}, synctask =
> _gf_true, private = 0x0, trans_private = 0x0, hdr_iobuf = 0x82b038,
> reply = 0x0}
> (gdb) p req->procnum
> $3 = 3 <== GLUSTERD_BRICK_XLATOR_OP
> (gdb) t a a bt
> 
> Thread 6 (Thread 0x7feabf178700 (LWP 12055)):
> #0  0x00007feaca522043 in epoll_wait () at ../sysdeps/unix/syscall-
> template.S:84
> #1  0x00007feacbe5076f in event_dispatch_epoll_worker (data=0x878130)
> at event-epoll.c:664
> #2  0x00007feacac4560a in start_thread (arg=0x7feabf178700) at
> pthread_create.c:334
> #3  0x00007feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 5 (Thread 0x7feac2659700 (LWP 12048)):
> #0  do_sigwait (sig=0x7feac2658e3c, set=<optimized out>) at
> ../sysdeps/unix/sysv/linux/sigwait.c:64
> #1  __sigwait (set=<optimized out>, sig=0x7feac2658e3c) at
> ../sysdeps/unix/sysv/linux/sigwait.c:96
> #2  0x0000000000409895 in glusterfs_sigwaiter (arg=0x7ffe3debbf00) at
> glusterfsd.c:2032
> #3  0x00007feacac4560a in start_thread (arg=0x7feac2659700) at
> pthread_create.c:334
> #4  0x00007feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 4 (Thread 0x7feacc2b4780 (LWP 12046)):
> #0  0x00007feacac466ad in pthread_join (threadid=140646205064960,
> thread_return=0x0) at pthread_join.c:90
> #1  0x00007feacbe509bb in event_dispatch_epoll (event_pool=0x830b80) at
> event-epoll.c:758
> #2  0x00007feacbe17a91 in event_dispatch (event_pool=0x830b80) at
> event.c:124
> #3  0x000000000040a3c8 in main (argc=13, argv=0x7ffe3debd0f8) at
> glusterfsd.c:2376
> 
> Thread 3 (Thread 0x7feac2e5a700 (LWP 12047)):
> #0  0x00007feacac4e27d in nanosleep () at ../sysdeps/unix/syscall-
> template.S:84
> #1  0x00007feacbdfc152 in gf_timer_proc (ctx=0x7fa010) at timer.c:188
> #2  0x00007feacac4560a in start_thread (arg=0x7feac2e5a700) at
> pthread_create.c:334
> #3  0x00007feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 2 (Thread 0x7feac1e58700 (LWP 12049)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
> #1  0x00007feacbe2d73d in syncenv_task (proc=0x838310) at syncop.c:603
> #2  0x00007feacbe2d9dd in syncenv_processor (thdata=0x838310) at
> syncop.c:695
> #3  0x00007feacac4560a in start_thread (arg=0x7feac1e58700) at
> pthread_create.c:334
> #4  0x00007feaca521a4d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> 
> Thread 1 (Thread 0x7feac1657700 (LWP 12050)):
> #0  0x000000000040bd31 in glusterfs_handle_translator_op
> (req=0x7feab8001dec) at glusterfsd-mgmt.c:590
> #1  0x00007feacbe2cf04 in synctask_wrap (old_task=0x7feab80031c0) at
> syncop.c:375
> #2  0x00007feaca467f30 in ?? () from /lib64/libc.so.6
> #3  0x0000000000000000 in ?? ()
> 
> Looking at the core, crash was seen from
> glusterfs_handle_translator_op() routine while doing a 'volume heal'
> command. I could then easily create a small test case to re-produce the
> issue. Please find the attachment for the same.
> 
> --Anoop C S.
> 
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel