Hi,
gdb debugging shows the rootcause seems to be quite straightforward. The
gluster version is 3.4.5 and the stack:
#0 0x00007eff735fe354 in dht_getxattr_cbk (frame=0x7eff775b6360,
cookie=<value optimized out>, this=<value optimized out>, op_ret=<value
optimized out>, op_errno=0,
xattr=<value optimized out>, xdata=0x0) at dht-common.c:2043
2043 DHT_STACK_UNWIND (getxattr, frame, local->op_ret,
op_errno,
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.80.el6.x86_64 keyutils-libs-1.4-4.el6.x86_64
krb5-libs-1.9-33.el6.x86_64 libcom_err-1.41.12-12.el6.x86_64
libgcc-4.4.6-4.el6.x86_64 libselinux-2.0.94-5.3.el6.x86_64
openssl-1.0.1e-16.el6_5.14.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0 0x00007eff735fe354 in dht_getxattr_cbk (frame=0x7eff775b6360,
cookie=<value optimized out>, this=<value optimized out>, op_ret=<value
optimized out>, op_errno=0,
xattr=<value optimized out>, xdata=0x0) at dht-common.c:2043
#1 0x00007eff7383c168 in afr_getxattr_cbk (frame=0x7eff7756ab58,
cookie=<value optimized out>, this=<value optimized out>, op_ret=0,
op_errno=0, dict=0x7eff76f21dc8, xdata=0x0)
at afr-inode-read.c:618
#2 0x00007eff73aaaad8 in client3_3_getxattr_cbk (req=<value optimized
out>, iov=<value optimized out>, count=<value optimized out>,
myframe=0x7eff77554d4c) at client-rpc-fops.c:1115
#3 0x0000003de700d6f5 in rpc_clnt_handle_reply (clnt=0xc36ad0,
pollin=0x14b21560) at rpc-clnt.c:771
#4 0x0000003de700ec6f in rpc_clnt_notify (trans=<value optimized out>,
mydata=0xc36b00, event=<value optimized out>, data=<value optimized
out>) at rpc-clnt.c:891
#5 0x0000003de700a4e8 in rpc_transport_notify (this=<value optimized
out>, event=<value optimized out>, data=<value optimized out>) at
rpc-transport.c:497
#6 0x00007eff74af6216 in socket_event_poll_in (this=0xc46530) at
socket.c:2118
#7 0x00007eff74af7c3d in socket_event_handler (fd=<value optimized
out>, idx=<value optimized out>, data=0xc46530, poll_in=1, poll_out=0,
poll_err=0) at socket.c:2230
#8 0x0000003de785e907 in event_dispatch_epoll_handler
(event_pool=0xb70e90) at event-epoll.c:384
#9 event_dispatch_epoll (event_pool=0xb70e90) at event-epoll.c:445
#10 0x0000000000406818 in main (argc=4, argv=0x7fff24878238) at
glusterfsd.c:1934
See dht_getxattr_cbk() (below). When frame->local is equal to 0, gluster
jumps to the label "out" where when it accesses local->xattr (i.e.
0->xattr), it crashes. Note in
DHT_STACK_UNWIND()->STACK_UNWIND_STRICT(), fn looks fine.
(gdb) p __local
$11 = (dht_local_t *) 0x0
(gdb) p frame->local
$12 = (void *) 0x0
(gdb) p fn
$1 = (fop_getxattr_cbk_t) 0x7eff7298c940 <mdc_readv_cbk>
I did not read the dht code much so I have not idea whether zero
frame->local is normal or not, but from the code's perspective this is
an obvious bug and it still exists in latest glusterfs workspace.
The following code change is a simple fix, but maybe there's a better one.
- if (is_last_call (this_call_cnt)) {
+ if (is_last_call (this_call_cnt) && local != NULL) {
Similar issues exist in other functions also, e.g. stripe_getxattr_cbk()
(I did not check all code).
int
dht_getxattr_cbk (call_frame_t *frame, void *cookie, xlator_t *this,
int op_ret, int op_errno, dict_t *xattr, dict_t *xdata)
{
int this_call_cnt = 0;
dht_local_t *local = NULL;
VALIDATE_OR_GOTO (frame, out);
VALIDATE_OR_GOTO (frame->local, out);
......
out:
if (is_last_call (this_call_cnt)) {
DHT_STACK_UNWIND (getxattr, frame, local->op_ret, op_errno,
local->xattr, NULL);
}
return 0;
}
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel