one correction...
after running slave-upgrade.sh on gluster-wien-07 for example the folder
1050 has a trusted.gfid only assigned on subvolume replicate-0 but
otherwise as stated in the last mail this is a totally wrong gfid and
does not appear in the master_gfid_file.txt.
[ 13:05:07 ] - root@gluster-wien-02 /usr/share/glusterfs/scripts
$getfattr -m . -d -e hex /gluster-export/1050
getfattr: Removing leading '/' from absolute path names
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
trusted.gfid=0x564d2217600b4e9c9ab5b34c53b1841c
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x00000001000000000000000055555554
[ 13:11:31 ] - root@gluster-wien-02 /usr/share/glusterfs/scripts $
[ 12:59:40 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $grep
564d2217-600b-4e9c-9ab5-b34c53b1841c /tmp/master_gfid_file.txt
[ 13:12:30 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $grep
d4815ee4-3348-4105-9136-d0219d956ed8 /tmp/master_gfid_file.txt
d4815ee4-3348-4105-9136-d0219d956ed8
1050="d4815ee4-3348-4105-9136-d0219d956ed8"
[ 13:12:36 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $
this confuses me, slave-upgrade.sh removes all in ~.glusterfs and makes
a setfattr -x on everything in the brick directory and assign obviously
a random gfid ?
when i was running slave-upgrade on gluster-wien-02 the trusted.gfid was
missing on four nodes but at least on the remaining two nodes the gfid
for 1050 was the same like on the master volume.
I'll try it again on wien-02..
best regards
dietmar
Am 22.12.2015 um 11:47 schrieb Dietmar Putz:
Hi Saravana,
thanks for your reply...
all gluster-nodes running ubuntu 14.04 using apparmor. Even when it is
running without any configuration i have unloaded the module to
prevent any influence.
i have stopped and deleted geo-replication one more time and started
the slave-upgrade.sh again but this time on gluster-wien-07,
geo-replication is currently not started again.
the result is the same as before and more comprehensive than first
identified by myself...
i have checked all directories in the root of each brick for a
trusted.gfid (567 dir's).
only on subvolume aut-wien-01-replicate-0 each directory has an
trusted.gfid assigned.
on subvolume ~replicate-1 and ~replicate-2 186 resp. 206 of 567
directories have an trusted.gfid assigned.
for example the directory /gluster-export/1050 which have been seen in
the geo-replication logs before...
the screenlog of the slave-upgrade.sh shows a 'failed' for setxattr on
1050 but this folder exist and contains data / folders on each subvolume.
[ 09:50:43 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $grep
1050 screenlog.0 | head -3
setxattr on ./1050="d4815ee4-3348-4105-9136-d0219d956ed8" failed (No
such file or directory)
setxattr on 1050/recordings="6056c887-99bc-4fcc-bf39-8ea2478bb780"
failed (No such file or directory)
setxattr on
1050/recordings/REC_22_3619210_63112.mp4="63d127a3-a387-4cb6-bb4b-792dc422ebbf"
failed (No such file or directory)
[ 09:50:53 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $
[ 10:11:01 ] - root@gluster-wien-07 /gluster-export $getfattr -m . -d
-e hex 1050
# file: 1050
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
[ 10:11:10 ] - root@gluster-wien-07 /gluster-export $ls -li | grep 1050
17179869881 drwxr-xr-x 72 1009 admin 4096 Dec 2 21:34 1050
[ 10:11:21 ] - root@gluster-wien-07 /gluster-export $du -hs 1050
877G 1050
[ 10:11:29 ] - root@gluster-wien-07 /gluster-export $
as far as i understood folder 1050 and many other folders should have
an unique trusted.gfid assigned like on all master nodes resp. on
subvolume aut-wien-01-replicate-0.
does it make sense to start the geo-replication again or does this
issue need to be fixed before starting another attempt...?
...and if yes, does anybody know how to fix the missing trusted.gfid ?
just restarting slave-upgrade did not help.
any help is appreciated.
best regards
dietmar
volume aut-wien-01-client-0 remote-host gluster-wien-02-int
volume aut-wien-01-client-1 remote-host gluster-wien-03-int
volume aut-wien-01-client-2 remote-host gluster-wien-04-int
volume aut-wien-01-client-3 remote-host gluster-wien-05-int
volume aut-wien-01-client-4 remote-host gluster-wien-06-int
volume aut-wien-01-client-5 remote-host gluster-wien-07-int
volume aut-wien-01-replicate-0 subvolumes aut-wien-01-client-0
aut-wien-01-client-1
volume aut-wien-01-replicate-1 subvolumes aut-wien-01-client-2
aut-wien-01-client-3
volume aut-wien-01-replicate-2 subvolumes aut-wien-01-client-4
aut-wien-01-client-5
volume glustershd
type debug/io-stats
subvolumes aut-wien-01-replicate-0 aut-wien-01-replicate-1
aut-wien-01-replicate-2
end-volume
Am 21.12.2015 um 08:08 schrieb Saravanakumar Arumugam:
Hi,
Replies inline..
Thanks,
Saravana
On 12/18/2015 10:02 PM, Dietmar Putz wrote:
Hello again...
after having some big trouble with an xfs issue in kernel 3.13.0-x
and 3.19.0-39 which has been 'solved' by downgrading to 3.8.4
(http://comments.gmane.org/gmane.comp.file-systems.xfs.general/71629)
we decided to start a new geo-replication attempt from scratch...
we have deleted the former geo-replication session and started a new
one as described in :
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6
master and slave is a distributed replicated volume running on
gluster 3.6.7 / ubuntu 14.04.
setup worked as described but unfortunately geo-replication isn't
syncing files and remains in the below shown status.
in the ~geo-replication-slaves/...gluster.log i can found on all
slave nodes messages like :
[2015-12-16 15:06:46.837748] W [dht-layout.c:180:dht_layout_search]
0-aut-wien-01-dht: no subvolume for hash (value) = 1448787070
[2015-12-16 15:06:46.837789] W [fuse-bridge.c:1261:fuse_err_cbk]
0-glusterfs-fuse: 74203: SETXATTR()
/.gfid/d4815ee4-3348-4105-9136-d0219d956ed8 => -1 (No such file or
directory)
[2015-12-16 15:06:47.090212] I
[dht-layout.c:663:dht_layout_normalize] 0-aut-wien-01-dht: Found
anomalies in (null) (gfid = d4815ee4-3348-4105-9136-d0219d956ed8).
Holes=1 overlaps=0
[2015-12-16 20:25:55.327874] W [fuse-bridge.c:1967:fuse_create_cbk]
0-glusterfs-fuse: 199968:
/.gfid/603de79d-8d41-44bd-845e-3727cf64a617 => -1 (Operation not
permitted)
[2015-12-16 20:25:55.617016] W [fuse-bridge.c:1967:fuse_create_cbk]
0-glusterfs-fuse: 199971:
/.gfid/8622fb7d-8909-42de-adb5-c67ed6f006c0 => -1 (Operation not
permitted)
Please check whether selinux is enabled in both Master/Slave..I
remember seeing such errors if selinux enabled.
this is found only on gluster-wien-03-int which is in 'Hybrid Crawl' :
[2015-12-16 17:17:07.219939] W [fuse-bridge.c:1261:fuse_err_cbk]
0-glusterfs-fuse: 123841: SETXATTR()
/.gfid/00000000-0000-0000-0000-000000000001 => -1 (File exists)
[2015-12-16 17:17:07.220658] W
[client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-3:
remote operation failed: File exists. Path: /2301
[2015-12-16 17:17:07.220702] W
[client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-2:
remote operation failed: File exists. Path: /2301
Some errors like "file exists" can be ignored.
But first of all i would like to have a look at this message, found
about 6000 times on gluster-wien-05-int and ~07-int which are in
'History Crawl':
[2015-12-16 13:03:25.658359] W [fuse-bridge.c:483:fuse_entry_cbk]
0-glusterfs-fuse: 119569: LOOKUP()
/.gfid/d4815ee4-3348-4105-9136-d0219d956ed8/.dstXXXfDyaP9 => -1
(Stale file handle)
The gfid d4815ee4-3348-4105-9136-d0219d956ed8
1050="d4815ee4-3348-4105-9136-d0219d956ed8" belongs as shown to the
folder 1050 in the brick-directory.
any brick in the master volume looks like this one ...:
Host : gluster-ger-ber-12-int
# file: gluster-export/1050
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.ger-ber-01-client-0=0x000000000000000000000000
trusted.afr.ger-ber-01-client-1=0x000000000000000000000000
trusted.afr.ger-ber-01-client-2=0x000000000000000000000000
trusted.afr.ger-ber-01-client-3=0x000000000000000000000000
trusted.gfid=0xd4815ee4334841059136d0219d956ed8
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.1c31dc4d-7ee3-423b-8577-c7b0ce2e356a.stime=0x56606290000c7e4e
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x567428e000042116
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9
on the slave volume just the brick of wien-02 and wien-03 have the
same trusted.gfid
Host : gluster-wien-03
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
trusted.gfid=0xd4815ee4334841059136d0219d956ed8
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x00000001000000000000000055555554
all nodes in 'History Crawl' haven't this trusted.gfid assigned.
Host : gluster-wien-05
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff
I'm not sure if it is normal or if that trusted.gfid should have
been assigned on all slave nodes by the slave-upgrade.sh script.
As per the doc, it applies gfid on all slave nodes.
bash slave-upgrade.sh localhost:<aut-wien-01>
/tmp/master_gfid_file.txt $PWD/gsync-sync-gfid was running on
wien-02 which has password less login for any other slave node.
as i could see in the process list slave-upgrade.sh was running on
each slave node and starts as far as i can remember with a 'rm -rf
~/.glusterfs/...'
so the mentioned gfid should disappeared by the slave-upgrade.sh but
should the trusted.gfid also be re-assigned by the script ?
...I'm confused,
is the 'Stale file handle' message based on the missing trusted.gfid
for /gluster-export/1050/ on the nodes where the message appears ?
does it make sense to geo-rep and to start the slave-upgrade.sh
script on the affected nodes without having access to the other
nodes to fix this ?
currently I'm not sure if the 'stale file handle' messages prevent
us from getting a running geo-replication but i guess best way is
trying to get it running step by step...
any help is appreciated.
best regards
dietmar
[ 14:45:42 ] - root@gluster-ger-ber-07
/var/log/glusterfs/geo-replication/ger-ber-01 $gluster volume
geo-replication ger-ber-01 gluster-wien-02::aut-wien-01 status detail
MASTER NODE MASTER VOL MASTER BRICK
SLAVE STATUS CHECKPOINT STATUS
CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING
DELETES PENDING FILES SKIPPED
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
gluster-ger-ber-07 ger-ber-01 /gluster-export
gluster-wien-07-int::aut-wien-01 Active N/A History Crawl -6500
0 0 5 6500
gluster-ger-ber-12 ger-ber-01 /gluster-export
gluster-wien-06-int::aut-wien-01 Passive N/A N/A 0
0 0 0 0
gluster-ger-ber-11 ger-ber-01 /gluster-export
gluster-wien-03-int::aut-wien-01 Active N/A Hybrid Crawl 0
8191 0 0 0
gluster-ger-ber-09 ger-ber-01 /gluster-export
gluster-wien-05-int::aut-wien-01 Active N/A History Crawl -5792
0 0 0 5793
gluster-ger-ber-10 ger-ber-01 /gluster-export
gluster-wien-02-int::aut-wien-01 Passive N/A N/A 0
0 0 0 0
gluster-ger-ber-08 ger-ber-01 /gluster-export
gluster-wien-04-int::aut-wien-01 Passive N/A N/A 0
0 0 0 0
[ 14:45:46 ] - root@gluster-ger-ber-07
/var/log/glusterfs/geo-replication/ger-ber-01 $
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users