Re: regression: brick crashed because of changelog xlator init failure

Atin Mukherjee <amukherj@xxxxxxxxxx> · Sat, 09 May 2015 17:57:25 +0530



On 05/09/2015 04:23 PM, Kotresh Hiremath Ravishankar wrote:
> Hi,
> 
> There are few regression failures with changelog translator init being failed and a core is generated
> as explained below.
> 
> 1. Why changelog translator init failed?
>    
>     In snapshot test cases, virtual multiple peers are setup in single node,
>     which causes 'Address already in use' and 'port already in use' error. Hence
>     changelog translator failed.
> 
> 2. Even if changelog translator failed it should not core why is the core?
> 
>    Well, the stack trace in regression run didn't help much.
>    I induced the error manually in local system and could trace in gdb
>    and is happening as below.
> 
>    There is some memory corruption in cleanup_and_exit path when translators are failed.
>    I suppose this could happen for any translator init failed and not only specific to 
>    changelog. Could some look into this?
> 
> #0  0x00007ffff6cb67e0 in pthread_spin_lock () from /lib64/libpthread.so.0
> #1  0x00007ffff7b70db5 in __gf_free (free_ptr=0x7fffe4031700) at mem-pool.c:303
> #2  0x00007ffff7b7120c in mem_put (ptr=0x7fffe403171c) at mem-pool.c:570
> #3  0x00007ffff7b43fb4 in log_buf_destroy (buf=buf@entry=0x7fffe403171c) at logging.c:357
> #4  0x00007ffff7b47001 in gf_log_flush_list (copy=copy@entry=0x7fffeb80aa50, ctx=ctx@entry=0x614010) at logging.c:1711
> #5  0x00007ffff7b4720d in gf_log_flush_extra_msgs (new=0, ctx=0x614010) at logging.c:1777
> #6  gf_log_set_log_buf_size (buf_size=buf_size@entry=0) at logging.c:270
> #7  0x00007ffff7b47267 in gf_log_disable_suppression_before_exit (ctx=0x614010) at logging.c:437
> #8  0x00000000004080ec in cleanup_and_exit (signum=signum@entry=0) at glusterfsd.c:1217
> #9  0x0000000000408a16 in glusterfs_process_volfp (ctx=ctx@entry=0x614010, fp=fp@entry=0x7fffe40014f0) at glusterfsd.c:2183
> #10 0x000000000040ccf7 in mgmt_getspec_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7fffe4000fa4) at glusterfsd-mgmt.c:1560
> #11 0x00007ffff7915c70 in rpc_clnt_handle_reply (clnt=clnt@entry=0x66d280, pollin=pollin@entry=0x7fffe4002540) at rpc-clnt.c:766
> #12 0x00007ffff7915ee4 in rpc_clnt_notify (trans=<optimized out>, mydata=0x66d2b0, event=<optimized out>, data=0x7fffe4002540) at rpc-clnt.c:894
> #13 0x00007ffff79121f3 in rpc_transport_notify (this=this@entry=0x66d6f0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7fffe4002540)
>     at rpc-transport.c:543
> #14 0x00007fffed2ca1f4 in socket_event_poll_in (this=this@entry=0x66d6f0) at socket.c:2290
> #15 0x00007fffed2ccfb4 in socket_event_handler (fd=fd@entry=8, idx=idx@entry=1, data=0x66d6f0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2403
> #16 0x00007ffff7b9aaba in event_dispatch_epoll_handler (event=0x7fffeb80ae90, event_pool=0x632c80) at event-epoll.c:572
> #17 event_dispatch_epoll_worker (data=0x66e8b0) at event-epoll.c:674
> #18 0x00007ffff6cb1ee5 in start_thread () from /lib64/libpthread.so.0
> #19 0x00007ffff65f8b8d in clone () from /lib64/libc.so.6
Probably another candidate for http://review.gluster.org/#/c/10417/ to
go in?

~Atin
> 
> Thanks and Regards,
> Kotresh H R
> 
> ----- Original Message -----
>> From: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> To: "Vijay Bellur" <vbellur@xxxxxxxxxx>
>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
>> Sent: Saturday, May 9, 2015 1:06:07 PM
>> Subject: Re:  regression: brick crashed because of changelog xlator init failure
>>
>> It is crashing in libgcc!!!
>>
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x00007ff5555a1867 in ?? () from ./lib64/libgcc_s.so.1
>> Missing separate debuginfos, use: debuginfo-install
>> glibc-2.12-1.149.el6_6.7.x86_64 keyutils-libs-1.4-5.el6.x86_64
>> krb5-libs-1.10.3-37.el6_6.x86_64 libcom_err-1.41.12-21.el6.x86_64
>> libgcc-4.4.7-11.el6.x86_64 libselinux-2.0.94-5.8.el6.x86_64
>> openssl-1.0.1e-30.el6.8.x86_64 zlib-1.2.3-29.el6.x86_64
>> (gdb) bt
>> #0  0x00007ff5555a1867 in ?? () from ./lib64/libgcc_s.so.1
>> #1  0x00007ff5555a2119 in _Unwind_Backtrace () from ./lib64/libgcc_s.so.1
>> #2  0x00007ff56170b8f6 in backtrace () from ./lib64/libc.so.6
>> #3  0x00007ff562826544 in _gf_msg_backtrace_nomem (level=GF_LOG_ALERT,
>> stacksize=200)
>>     at
>>     /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/logging.c:1097
>> #4  0x00007ff562845b82 in gf_print_trace (signum=11, ctx=0xabc010)
>>     at
>>     /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/common-utils.c:618
>> #5  0x0000000000409646 in glusterfsd_print_trace (signum=11) at
>> /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:2007
>> #6  <signal handler called>
>> #7  0x00007ff554484fa9 in ?? ()
>> #8  0x00007ff561d8b9d1 in start_thread () from ./lib64/libpthread.so.0
>> #9  0x00007ff5616f58fd in clone () from ./lib64/libc.so.6
>>
>>
>> Thanks and Regards,
>> Kotresh H R
>>
>> ----- Original Message -----
>>> From: "Vijay Bellur" <vbellur@xxxxxxxxxx>
>>> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>, "Pranith Kumar
>>> Karampuri" <pkarampu@xxxxxxxxxx>
>>> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
>>> Sent: Saturday, May 9, 2015 12:52:33 PM
>>> Subject: Re:  regression: brick crashed because of changelog
>>> xlator init failure
>>>
>>> On 05/09/2015 12:49 PM, Kotresh Hiremath Ravishankar wrote:
>>>> If you observe the logs below. Socket binding failed because of Address
>>>> and
>>>> port already in use ERROR.
>>>> Because of that changelog failed to initiate rpc server, hence failed.
>>>> Not sure why socket binding failed in this machine.
>>>>
>>>> [2015-05-08 21:34:47.747059] E [socket.c:823:__socket_server_bind]
>>>> 0-socket.patchy-changelog: binding to  failed: Address already in use
>>>> [2015-05-08 21:34:47.747078] E [socket.c:826:__socket_server_bind]
>>>> 0-socket.patchy-changelog: Port is already in use
>>>> [2015-05-08 21:34:47.747096] W [rpcsvc.c:1602:rpcsvc_transport_create]
>>>> 0-rpc-service: listening on transport failed
>>>> [2015-05-08 21:34:47.747197] I [mem-pool.c:587:mem_pool_destroy]
>>>> 0-patchy-changelog: size=116 max=0 total=0
>>>> [2015-05-08 21:34:47.750460] E [xlator.c:426:xlator_init]
>>>> 0-patchy-changelog: Initialization of volume 'patchy-changelog' failed,
>>>> review your volfile again
>>>> [2015-05-08 21:34:47.750485] E [graph.c:322:glusterfs_graph_init]
>>>> 0-patchy-changelog: initializing translator failed
>>>> [2015-05-08 21:34:47.750497] E [graph.c:661:glusterfs_graph_activate]
>>>> 0-graph: init failed
>>>> [2015-05-08 21:34:47.749020] I
>>>> [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread
>>>> with index 2
>>>
>>> Irrespective of a socket bind failing, we should not crash. any ideas
>>> why glusterfsd crashed?
>>>
>>> -Vijay
>>>
>>>
>>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@xxxxxxxxxxx
>> http://www.gluster.org/mailman/listinfo/gluster-devel
>>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-devel
> 

-- 
~Atin
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel