Problem listing files on geo-replicated volume after upgrade to 3.4.6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

we have a problem on a geo-replicated volume after upgrade from glusterfs 3.3.2 to 3.4.6 on ubuntu 12.04.5 lts. for e.g. a 'ls -l' on the mounted geo-replicated volume does not show the entire content while the same command on the underlying bricks shows the entire content.

the events in chronological order..:

we are running a 6 node distributed replicated Volume (vol1) which is geo-replicated to a 4 node distributed replicated Volume (vol2). disk space on vol2 becomes insufficient so we needed to add two further nodes.
vol1 and vol2 is running on ubuntu 12.04 lts / glusterfs 3.3.2
we stopped the geo-replication, stopped the vol2 and updated the nodes of vol2 to the latest ubuntu 12.04.5 release (dist-upgrade) and to glusterfs 3.4.6. all gluster-clients which make use of vol2 were also updated from glusterfs-client 3.3.2 to 3.4.6. then we added two further bricks to vol2 with the same software level (ubuntu 12.04.5 lts,gfs 3.4.6) like the first four nodes and started the volume vol2 again. afterwards we started a rebalance process on vol2 and the geo-replication on the master-node of vol1. a check-script on geo-replication master is copying/deleting a testfile to vol1 in dependence of the existence of that file on vol2. everything seems to be ok so far...

after the rebalance process was finished (without errors) we observed an abnormality on vol2...the data on vol2 is somehow unequal distributed...the first two pairs shows a brick-usage of about 80% while the last added pair shows a brick-usage of about 50%. so we restarted the rebalance process twice but nothing changed... however, more critical than that is the fact that since update and expansion of vol2 we cannot see/access all files by default on the mounted vol2 while the files are visible in their brick-directories...

example 1:

vol1 contains 446 files/directories, for e.g. directory /sdn/1051
vol1 is mounted to /sdn :
[ 15:54:28 ] - root@vol1  /sdn $ls -l | wc -l
446
[ 15:55:06 ] - root@vol1  /sdn $ls -l | grep 1051
drwxrwxrwx  5  1007  1013   12288 Jan 22 07:42 1051
[ 15:55:46 ] - root@vol1  /sdn $du -ks 1051
5588129    1051
[ 15:56:03 ] - root@vol1  /sdn $

vol2 contains 304 files/directoris, but 1051 is not listed. when i run a 'du -ks /sdn/1051' or a 'ls -l /sdn/1051' on vol2 the directory becomes visible...
vol2 is mounted to /sdn :
[ 15:54:35 ] - root@vol2  /sdn $ls | wc -l
304
[ 15:56:19 ] - root@vol2  /sdn $ls -l | grep 1051
[ 15:56:28 ] - root@vol2  /sdn $du -ks 1051
5588001    1051
[ 15:56:43 ] - root@vol2  /sdn $ls -l | grep 1051
drwxrwxrwx  5  1007  1013   8255 Apr 17 15:56 1051
[ 15:56:59 ] - root@vol2  /sdn $ls | wc -l
305

example 2:
directory 2098 is visible on the brick but not on the gluster-volume.
after listing the named-directory it is visible on the gluster-volume again.
[ 16:11:00 ] - root@vol2  /sdn $ls | grep 2098
[ 16:12:21 ] - root@vol2  /sdn $ls -l /gluster-export/ | grep 2098
drwxrwxrwx  4  1015  1013   4096 Jan 18 03:07 2098

[ 16:12:28 ] - root@vol2  /sdn $ls -l /sdn/2098
...
[ 16:13:12 ] - root@vol2  /sdn $ls -l | grep 2098
drwxrwxrwx  4  1015  1013   8237 Apr 17 16:13 2098
[ 16:13:27 ] - root@vol2  /sdn $
[ 16:13:27 ] - root@vol2  /sdn $ls | wc -l
306

i did not found helpful hints in the gluster-logs, currently i'm frequently faced with following messages, but the missing directories on vol2 are not mentioned :

vol2 :
 $tail -f sdn.log
[2015-04-17 14:00:14.816730] I [dht-layout.c:726:dht_layout_dir_mismatch] 1-aut-wien-01-dht: /1011 - disk layout missing [2015-04-17 14:00:14.816745] I [dht-common.c:638:dht_revalidate_cbk] 1-aut-wien-01-dht: mismatching layouts for /1011 [2015-04-17 14:00:14.817590] I [dht-layout.c:726:dht_layout_dir_mismatch] 1-aut-wien-01-dht: /1005 - disk layout missing [2015-04-17 14:00:14.817602] I [dht-common.c:638:dht_revalidate_cbk] 1-aut-wien-01-dht: mismatching layouts for /1005

vol 1 is slightly smaller than vol2. all nodes are using the same disk-configuration and all bricks are xfs-formatted.
df -m :
vol1:/vol1  57217563 39230421  17987143   69% /sdn
vol2:/vol2  57217563 40399541  16818023  71% /sdn

currently i'm confused because i don't know the reason for this behaviour...
i guess it was not a good idea to update the geo-replication slave to 3.4.6 while the master is still running 3.3.2, but I'm not sure. possibly there is an issue with 3.4.6 itself and geo-replication does not have any influence on that.
for the first time i stopped the geo-replication.
can somebody point me to the cause or has helpful hints what to do next...?

best regards
dietmar

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux