glusterfs 3.3 self-heal daemon crash and can't be started

yongtaofu at gmail.com (符永涛) · Thu, 14 Mar 2013 18:07:54 +0800

So it has nothing to do with rebalance.

2013/3/14, ??? <yongtaofu at gmail.com>:
> I have fix this bug in our local glusterfs 3.3 repo, the root cause is
> in glusterfs 3.3
> glusterfsd/src/glusterfsd-mgmt.c line 1394
> static char oldvolfile[131072];
>
> so if the volume
> file(/var/lib/glusterd/glustershd/glustershd-server.vol) is larger
> than 128K then it simply crashes. This happens if there're a lot of
> volumes on the server and the server volume file is larger than 128k.
> on the line 1629
>  memcpy (oldvolfile, rsp.spec, size);
>
> It should be a bug.
>
> FYI
> Thank you very much.
>
> 2013/3/14, Vijay Bellur <vbellur at redhat.com>:
>> On 03/14/2013 02:08 PM, ??? wrote:
>>> Dear glusterfs experts,
>>> Recently we have encountered a self-heal daemon crash issue after
>>> rebalanced volume.
>>> Crash stack bellow:
>>> +------------------------------------------------------------------------------+
>>> pending frames:
>>>
>>> patchset: git://git.gluster.com/glusterfs.git
>>> signal received: 11
>>> time of crash: 2013-03-14 16:33:50
>>> configuration details:
>>> argp 1
>>> backtrace 1
>>> dlfcn 1
>>> fdatasync 1
>>> libpthread 1
>>> llistxattr 1
>>> setfsid 1
>>> spinlock 1
>>> epoll.h 1
>>> xattr.h 1
>>> st_atim.tv_nsec 1
>>> package-string: glusterfs 3.3.0
>>> /lib64/libc.so.6[0x38d0a32920]
>>> /lib64/libc.so.6(memcpy+0x309)[0x38d0a88da9]
>>> /usr/sbin/glusterfs(mgmt_getspec_cbk+0x398)[0x40c888]
>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x38d1a0f4d5]
>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x38d1a0fcd0]
>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x38d1a0aeb8]
>>> /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f1d47b8f784]
>>> /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so(socket_event_handler+0xc7)[0x7f1d47b8f867]
>>> /usr/lib64/libglusterfs.so.0[0x38d1e3e4a4]
>>> /usr/sbin/glusterfs(main+0x58a)[0x40731a]
>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x38d0a1ecdd]
>>> /usr/sbin/glusterfs[0x404289]
>>> ---------
>>>
>>> Any none know how to fix it. Currently the self-heal daemon can't be
>>> started.
>>
>> Can you please post details of your volume configuration and glustershd
>> logs from the node where the crash is seen?
>>
>> Thanks,
>> Vijay
>>
>>
>>
>
>
> --
> ???
>

-- 
???