Re: Possible bug in the communications layer ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jeff,

On 28.04.2016 15:20, Jeff Darcy wrote:

This happens with Gluster 3.7.11 accessed through Ganesha and gfapi. The volume is a distributed-disperse 4*(4+2). I'm able to reproduce the problem easily doing the following test: iozone -t2 -s10g -r1024k -i0 -w -F/iozone{1..2}.dat echo 3 >/proc/sys/vm/drop_caches iozone -t2 -s10g -r1024k -i1 -w -F/iozone{1..2}.dat The error happens soon after starting the read test. As can be seen in the data below, client3_3_readv_cbk() is processing an iovec of 116 bytes, however it should be of 154 bytes (the buffer in memory really seems to contain 154 bytes). The data on the network seems ok (at least I haven't been able to identify any problem), so this must be a processing error on the client side. The last field in cut buffer of the sequentialized data corresponds to the length of the xdata field: 0x26. So at least 38 more byte should be present.
Nice detective work, Xavi.  It would be *very* interesting to see what
the value of the "count" parameter is (it's unfortunately optimized out).
I'll bet it's two, and iov[1].iov_len is 38.  I have a weak memory of
some problems with how this iov is put together, a couple of years ago,
and it looks like you might have tripped over one more.
 
It seems you are right. The count is 2 and the first 38 bytes of the second vector contains the remaining data of xdata field. The rest of the data in the second vector seems the payload of the readv fop, plus a 2 bytes padding:
 
(gdb) f 0
#0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, count=<optimized out>, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print *iov
$2 = {iov_base = 0x7fdc14b0d018, iov_len = 116}
(gdb) f 1
#1 0x00007fdc56dafab0 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7fdc3c1f4bb0, pollin=pollin@entry=0x7fdc34010f20) at rpc-clnt.c:764
764 req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame);
(gdb) print *pollin
$3 = {vector = {{iov_base = 0x7fdc14b0d000, iov_len = 140}, {iov_base = 0x7fdc14a4d000, iov_len = 32808}, {iov_base = 0x0, iov_len = 0} <repeats 14 times>}, count = 2,
vectored = 1 '\001', private = 0x7fdc340106c0, iobref = 0x7fdc34006660, hdr_iobuf = 0x7fdc3c4c07c0, is_reply = 1 '\001'}
(gdb) f 0
#0 client3_3_readv_cbk (req=0x7fdc4051a31c, iov=0x7fdc4051a35c, count=<optimized out>, myframe=0x7fdc520d505c) at client-rpc-fops.c:3021
3021 gf_msg (this->name, GF_LOG_ERROR, EINVAL,
(gdb) print iov[1]
$4 = {iov_base = 0x7fdc14a4d000, iov_len = 32808}
(gdb) print iov[2]
$5 = {iov_base = 0x2, iov_len = 140583741974112}
(gdb) x/128xb 0x7fdc14a4d000
0x7fdc14a4d000: 0x00 0x00 0x00 0x01 0x00 0x00 0x00 0x17
0x7fdc14a4d008: 0x00 0x00 0x00 0x02 0x67 0x6c 0x75 0x73
0x7fdc14a4d010: 0x74 0x65 0x72 0x66 0x73 0x2e 0x69 0x6e
0x7fdc14a4d018: 0x6f 0x64 0x65 0x6c 0x6b 0x2d 0x63 0x6f
0x7fdc14a4d020: 0x75 0x6e 0x74 0x00 0x31 0x00 0x00 0x00
0x7fdc14a4d028: 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c 0x5c
0x7fdc14a4d030: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d038: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d040: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d048: 0x5c 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d050: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d058: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d060: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d068: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d070: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x7fdc14a4d078: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
 
 Maybe it's related to all that epoll stuff.
 
I'm currently using 4 epoll threads (this improves ec performance). I'll try to repeat the tests with a single epoll thread, but I'm not sure if this will be enough to get any conclusion if the problem doesn't manifest, since ec through fuse with 4 epoll threads doesn't seem to trigger the problem.
 
Xavi
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux