From: "Sanoj Unnikrishnan" <sunnikri@xxxxxxxxxx>
To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Ashish Pandey" <aspandey@xxxxxxxxxx>, xhernandez@xxxxxxxxxx,
"Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
Sent: Monday, September 12, 2016 7:06:59 PM
Subject: Need help with https://bugzilla.redhat.com/show_bug.cgi?id=1224180
Hello Xavi/Pranith,
I have been able to reproduce the BZ with the following steps:
gluster volume create v_disp disperse 6 redundancy 2 $tm1:/export/sdb/br1
$tm2:/export/sdb/b2 $tm3:/export/sdb/br3 $tm1:/export/sdb/b4
$tm2:/export/sdb/b5 $tm3:/export/sdb/b6 force
#(Used only 3 nodes, should not matter here)
gluster volume start v_disp
mount -t glusterfs $tm1:v_disp /gluster_vols/v_disp
mkdir /gluster_vols/v_disp/dir1
dd if=/dev/zero of=/gluster_vols/v_disp/dir1/x bs=10k count=90000 &
gluster v quota v_disp enable
gluster v quota v_disp limit-usage /dir1 200MB
gluster v quota v_disp soft-timeout 0
gluster v quota v_disp hard-timeout 0
#optional remove 2 bricks (reproduces more often with this)
#pgrep glusterfsd | xargs kill -9
IO error on stdout when Quota exceeds, followed by Disk Quota exceeded.
Also note the issue is seen when A flush happens simultaneous with quota
limit hit, Hence Its not seen only on some runs.
The following are the error in logs.
[2016-09-12 10:40:02.431568] E [MSGID: 122034]
[ec-common.c:488:ec_child_select] 0-v_disp-disperse-0: Insufficient
available childs for this request (have 0, need 4)
[2016-09-12 10:40:02.431627] E [MSGID: 122037]
[ec-common.c:1830:ec_update_size_version_done] 0-Disperse: sku-debug:
pre-version=0/0, size=0post-version=1865/1865, size=209571840
[2016-09-12 10:40:02.431637] E [MSGID: 122037]
[ec-common.c:1835:ec_update_size_version_done] 0-v_disp-disperse-0: Failed
to update version and size [Input/output error]
[2016-09-12 10:40:02.431664] E [MSGID: 122034]
[ec-common.c:417:ec_child_select] 0-v_disp-disperse-0: sku-debug: mask: 36,
ec->xl_up 36, ec->node_mask 3f, parent->mask:36, fop->parent->healing:0,
id:29
[2016-09-12 10:40:02.431673] E [MSGID: 122034]
[ec-common.c:480:ec_child_select] 0-v_disp-disperse-0: sku-debug: mask: 36,
remaining: 36, healing: 0, ec->xl_up 36, ec->node_mask 3f, parent->mask:36,
num:4, minimum: 1, id:29
...
[2016-09-12 10:40:02.487302] W [fuse-bridge.c:2311:fuse_writev_cbk]
0-glusterfs-fuse: 41159: WRITE => -1
gfid=ee0b4aa1-1f44-486a-883c-acddc13ee318 fd=0x7f1d9c003edc (Input/output
error)
[2016-09-12 10:40:02.500151] W [MSGID: 122006]
[ec-combine.c:206:ec_iatt_combine] 0-v_disp-disperse-0: Failed to combine
iatt (inode: 9816911356190712600-9816911356190712600, links: 1-1, uid: 0-0,
gid: 0-0, rdev: 0-0, size: 52423680-52413440, mode: 100644-100644)
[2016-09-12 10:40:02.500188] N [MSGID: 122029]
[ec-combine.c:93:ec_combine_write] 0-v_disp-disperse-0: Mismatching iatt in
answers of 'WRITE'
[2016-09-12 10:40:02.504551] W [MSGID: 122006]
[ec-combine.c:206:ec_iatt_combine] 0-v_disp-disperse-0: Failed to combine
iatt (inode: 9816911356190712600-9816911356190712600, links: 1-1, uid: 0-0,
gid: 0-0, rdev: 0-0, size: 52423680-52413440, mode: 100644-100644)
....
....
[2016-09-12 10:40:02.571272] N [MSGID: 122029]
[ec-combine.c:93:ec_combine_write] 0-v_disp-disperse-0: Mismatching iatt in
answers of 'WRITE'
[2016-09-12 10:40:02.571510] W [MSGID: 122006]
[ec-combine.c:206:ec_iatt_combine] 0-v_disp-disperse-0: Failed to combine
iatt (inode: 9816911356190712600-9816911356190712600, links: 1-1, uid: 0-0,
gid: 0-0, rdev: 0-0, size: 52423680-52413440, mode: 100644-100644)
[2016-09-12 10:40:02.571544] N [MSGID: 122029]
[ec-combine.c:93:ec_combine_write] 0-v_disp-disperse-0: Mismatching iatt in
answers of 'WRITE'
[2016-09-12 10:40:02.571772] W [fuse-bridge.c:1290:fuse_err_cbk]
0-glusterfs-fuse: 41160: FLUSH() ERR => -1 (Input/output error)
Also, for some fops before the write I noticed the fop->mask field as 0, Its
not clear why this happens ??
[2016-09-12 10:40:02.431561] E [MSGID: 122034]
[ec-common.c:480:ec_child_select] 0-v_disp-disperse-0: sku-debug: mask: 0,
remaining: 0, healing: 0, ec->xl_up 36, ec->node_mask 3f, parent->mask:36,
num:0, minimum: 4, fop->id:34
[2016-09-12 10:40:02.431568] E [MSGID: 122034]
[ec-common.c:488:ec_child_select] 0-v_disp-disperse-0: Insufficient
available childs for this request (have 0, need 4)
[2016-09-12 10:40:02.431637] E [MSGID: 122037]
[ec-common.c:1835:ec_update_size_version_done] 0-v_disp-disperse-0: Failed
to update version and size [Input/output error]
Is the zero value of fop->mask related to mismatch in iatt ?
Any scenario of race between write/flush fop?
please suggest how to proceed.
Thanks and Regards,
Sanoj