glusterfs 3.3 self-heal daemon crash and can't be started

yongtaofu at gmail.com (符永涛) · Thu, 14 Mar 2013 18:48:37 +0800

Hi Vijay Bellur,
I just change static char oldvolfile[131072] to a larger value, please
correct me if there're any side effect of it.
Thank you very much.

2013/3/14, ??? <yongtaofu at gmail.com>:
> So it has nothing to do with rebalance.
>
> 2013/3/14, ??? <yongtaofu at gmail.com>:
>> I have fix this bug in our local glusterfs 3.3 repo, the root cause is
>> in glusterfs 3.3
>> glusterfsd/src/glusterfsd-mgmt.c line 1394
>> static char oldvolfile[131072];
>>
>> so if the volume
>> file(/var/lib/glusterd/glustershd/glustershd-server.vol) is larger
>> than 128K then it simply crashes. This happens if there're a lot of
>> volumes on the server and the server volume file is larger than 128k.
>> on the line 1629
>>  memcpy (oldvolfile, rsp.spec, size);
>>
>> It should be a bug.
>>
>> FYI
>> Thank you very much.
>>
>> 2013/3/14, Vijay Bellur <vbellur at redhat.com>:
>>> On 03/14/2013 02:08 PM, ??? wrote:
>>>> Dear glusterfs experts,
>>>> Recently we have encountered a self-heal daemon crash issue after
>>>> rebalanced volume.
>>>> Crash stack bellow:
>>>> +------------------------------------------------------------------------------+
>>>> pending frames:
>>>>
>>>> patchset: git://git.gluster.com/glusterfs.git
>>>> signal received: 11
>>>> time of crash: 2013-03-14 16:33:50
>>>> configuration details:
>>>> argp 1
>>>> backtrace 1
>>>> dlfcn 1
>>>> fdatasync 1
>>>> libpthread 1
>>>> llistxattr 1
>>>> setfsid 1
>>>> spinlock 1
>>>> epoll.h 1
>>>> xattr.h 1
>>>> st_atim.tv_nsec 1
>>>> package-string: glusterfs 3.3.0
>>>> /lib64/libc.so.6[0x38d0a32920]
>>>> /lib64/libc.so.6(memcpy+0x309)[0x38d0a88da9]
>>>> /usr/sbin/glusterfs(mgmt_getspec_cbk+0x398)[0x40c888]
>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x38d1a0f4d5]
>>>> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x120)[0x38d1a0fcd0]
>>>> /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x38d1a0aeb8]
>>>> /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f1d47b8f784]
>>>> /usr/lib64/glusterfs/3.3.0/rpc-transport/socket.so(socket_event_handler+0xc7)[0x7f1d47b8f867]
>>>> /usr/lib64/libglusterfs.so.0[0x38d1e3e4a4]
>>>> /usr/sbin/glusterfs(main+0x58a)[0x40731a]
>>>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x38d0a1ecdd]
>>>> /usr/sbin/glusterfs[0x404289]
>>>> ---------
>>>>
>>>> Any none know how to fix it. Currently the self-heal daemon can't be
>>>> started.
>>>
>>> Can you please post details of your volume configuration and glustershd
>>> logs from the node where the crash is seen?
>>>
>>> Thanks,
>>> Vijay
>>>
>>>
>>>
>>
>>
>> --
>> ???
>>
>
>
> --
> ???
>

-- 
???