The following commit has broken the brick
multiplexing regression job. tests/bugs/bug-1371806_1.t has
failed couple of times. One of the latest regression job report
is at
https://build.gluster.org/job/regression-test-with-multiplex/406/console
.
commit 9b4de61a136b8e5ba7bf0e48690cdb1292d0dee8
Author: Mohit Agrawal <
moagrawa@xxxxxxxxxx>
Date: Fri May 12 21:12:47 2017 +0530
cluster/dht : User xattrs are not healed after brick
stop/start
Problem: In a distributed volume custom extended attribute
value for a directory
does not display correct value after stop/start or
added newly brick.
If any extended(acl) attribute value is set for a
directory after stop/added
the brick the attribute(user|acl|quota) value is
not updated on brick
after start the brick.
Solution: First store hashed subvol or subvol(has internal
xattr) on inode ctx and
consider it as a MDS subvol.At the time of update
custom xattr
(user,quota,acl, selinux) on directory first check
the mds from
inode ctx, if mds is not present on inode ctx then
throw EINVAL error
to application otherwise set xattr on MDS subvol
with internal xattr
value of -1 and then try to update the attribute
on other non MDS
volumes also.If mds subvol is down in that case
throw an
error "Transport endpoint is not connected". In
dht_dir_lookup_cbk|
dht_revalidate_cbk|dht_discover_complete call
dht_call_dir_xattr_heal
to heal custom extended attribute.
In case of gnfs server if hashed subvol has not
found based on
loc then wind a call on all subvol to update
xattr.
Fix: 1) Save MDS subvol on inode ctx
2) Check if mds subvol is present on inode ctx
3) If mds subvol is down then call unwind with error
ENOTCONN and if it is up
then set new xattr "GF_DHT_XATTR_MDS" to -1 and
wind a call on other
subvol.
4) If setxattr fop is successful on non-mds subvol
then increment the value of
internal xattr to +1
5) At the time of directory_lookup check the value
of new xattr GF_DHT_XATTR_MDS
6) If value is not 0 in dht_lookup_dir_cbk(other
cbk) functions then call heal
function to heal user xattr
7) syncop_setxattr on hashed_subvol to reset the
value of xattr to 0
if heal is successful on all subvol.
Test : To reproduce the issue followed below steps
1) Create a distributed volume and create mount point
2) Create some directory from mount point mkdir
tmp{1..5}
3) Kill any one brick from the volume
4) Set extended attribute from mount point on
directory
setfattr -n user.foo -v "abc" ./tmp{1..5}
It will throw error " Transport End point is not
connected "
for those hashed subvol is down
5) Start volume with force option to start brick
process
6) Execute getfattr command on mount point for
directory
7) Check extended attribute on brick
getfattr -n user.foo
<volume-location>/tmp{1..5}
It shows correct value for directories for those
xattr fop were executed successfully.
Note: The patch will resolve xattr healing problem only for
fuse mount
not for nfs mount.
BUG: 1371806
Signed-off-by: Mohit Agrawal <
moagrawa@xxxxxxxxxx>
Change-Id: I4eb137eace24a8cb796712b742f1d177a65343d5