Re: geo-replication 3.6.7 - no trusted.gfid on some slave nodes - stale file handle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



one correction...
after running slave-upgrade.sh on gluster-wien-07 for example the folder 1050 has a trusted.gfid only assigned on subvolume replicate-0 but otherwise as stated in the last mail this is a totally wrong gfid and does not appear in the master_gfid_file.txt.


[ 13:05:07 ] - root@gluster-wien-02 /usr/share/glusterfs/scripts $getfattr -m . -d -e hex /gluster-export/1050
getfattr: Removing leading '/' from absolute path names
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
trusted.gfid=0x564d2217600b4e9c9ab5b34c53b1841c
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x00000001000000000000000055555554

[ 13:11:31 ] - root@gluster-wien-02 /usr/share/glusterfs/scripts $
[ 12:59:40 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $grep 564d2217-600b-4e9c-9ab5-b34c53b1841c /tmp/master_gfid_file.txt [ 13:12:30 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $grep d4815ee4-3348-4105-9136-d0219d956ed8 /tmp/master_gfid_file.txt d4815ee4-3348-4105-9136-d0219d956ed8 1050="d4815ee4-3348-4105-9136-d0219d956ed8"
[ 13:12:36 ] - root@gluster-wien-07  /usr/share/glusterfs/scripts $

this confuses me, slave-upgrade.sh removes all in ~.glusterfs and makes a setfattr -x on everything in the brick directory and assign obviously a random gfid ? when i was running slave-upgrade on gluster-wien-02 the trusted.gfid was missing on four nodes but at least on the remaining two nodes the gfid for 1050 was the same like on the master volume.
I'll try it again on wien-02..

best regards
dietmar



Am 22.12.2015 um 11:47 schrieb Dietmar Putz:
Hi Saravana,

thanks for your reply...
all gluster-nodes running ubuntu 14.04 using apparmor. Even when it is running without any configuration i have unloaded the module to prevent any influence.

i have stopped and deleted geo-replication one more time and started the slave-upgrade.sh again but this time on gluster-wien-07, geo-replication is currently not started again. the result is the same as before and more comprehensive than first identified by myself... i have checked all directories in the root of each brick for a trusted.gfid (567 dir's). only on subvolume aut-wien-01-replicate-0 each directory has an trusted.gfid assigned. on subvolume ~replicate-1 and ~replicate-2 186 resp. 206 of 567 directories have an trusted.gfid assigned.

for example the directory /gluster-export/1050 which have been seen in the geo-replication logs before... the screenlog of the slave-upgrade.sh shows a 'failed' for setxattr on 1050 but this folder exist and contains data / folders on each subvolume.

[ 09:50:43 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $grep 1050 screenlog.0 | head -3 setxattr on ./1050="d4815ee4-3348-4105-9136-d0219d956ed8" failed (No such file or directory) setxattr on 1050/recordings="6056c887-99bc-4fcc-bf39-8ea2478bb780" failed (No such file or directory) setxattr on 1050/recordings/REC_22_3619210_63112.mp4="63d127a3-a387-4cb6-bb4b-792dc422ebbf" failed (No such file or directory)
[ 09:50:53 ] - root@gluster-wien-07 /usr/share/glusterfs/scripts $

[ 10:11:01 ] - root@gluster-wien-07 /gluster-export $getfattr -m . -d -e hex 1050
# file: 1050
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

[ 10:11:10 ] - root@gluster-wien-07  /gluster-export $ls -li | grep 1050
 17179869881 drwxr-xr-x 72  1009 admin   4096 Dec  2 21:34 1050
[ 10:11:21 ] - root@gluster-wien-07  /gluster-export $du -hs 1050
877G    1050
[ 10:11:29 ] - root@gluster-wien-07  /gluster-export $

as far as i understood folder 1050 and many other folders should have an unique trusted.gfid assigned like on all master nodes resp. on subvolume aut-wien-01-replicate-0. does it make sense to start the geo-replication again or does this issue need to be fixed before starting another attempt...? ...and if yes, does anybody know how to fix the missing trusted.gfid ? just restarting slave-upgrade did not help.

any help is appreciated.

best regards
dietmar



volume aut-wien-01-client-0 remote-host gluster-wien-02-int
volume aut-wien-01-client-1 remote-host gluster-wien-03-int
volume aut-wien-01-client-2 remote-host gluster-wien-04-int
volume aut-wien-01-client-3 remote-host gluster-wien-05-int
volume aut-wien-01-client-4 remote-host gluster-wien-06-int
volume aut-wien-01-client-5 remote-host gluster-wien-07-int
volume aut-wien-01-replicate-0 subvolumes aut-wien-01-client-0 aut-wien-01-client-1 volume aut-wien-01-replicate-1 subvolumes aut-wien-01-client-2 aut-wien-01-client-3 volume aut-wien-01-replicate-2 subvolumes aut-wien-01-client-4 aut-wien-01-client-5
volume glustershd
    type debug/io-stats
subvolumes aut-wien-01-replicate-0 aut-wien-01-replicate-1 aut-wien-01-replicate-2
end-volume


Am 21.12.2015 um 08:08 schrieb Saravanakumar Arumugam:
Hi,
Replies inline..

Thanks,
Saravana

On 12/18/2015 10:02 PM, Dietmar Putz wrote:
Hello again...

after having some big trouble with an xfs issue in kernel 3.13.0-x and 3.19.0-39 which has been 'solved' by downgrading to 3.8.4 (http://comments.gmane.org/gmane.comp.file-systems.xfs.general/71629)
we decided to start a new geo-replication attempt from scratch...
we have deleted the former geo-replication session and started a new one as described in :
http://www.gluster.org/community/documentation/index.php/Upgrade_to_3.6

master and slave is a distributed replicated volume running on gluster 3.6.7 / ubuntu 14.04. setup worked as described but unfortunately geo-replication isn't syncing files and remains in the below shown status.

in the ~geo-replication-slaves/...gluster.log i can found on all slave nodes messages like :

[2015-12-16 15:06:46.837748] W [dht-layout.c:180:dht_layout_search] 0-aut-wien-01-dht: no subvolume for hash (value) = 1448787070 [2015-12-16 15:06:46.837789] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 74203: SETXATTR() /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8 => -1 (No such file or directory) [2015-12-16 15:06:47.090212] I [dht-layout.c:663:dht_layout_normalize] 0-aut-wien-01-dht: Found anomalies in (null) (gfid = d4815ee4-3348-4105-9136-d0219d956ed8). Holes=1 overlaps=0

[2015-12-16 20:25:55.327874] W [fuse-bridge.c:1967:fuse_create_cbk] 0-glusterfs-fuse: 199968: /.gfid/603de79d-8d41-44bd-845e-3727cf64a617 => -1 (Operation not permitted) [2015-12-16 20:25:55.617016] W [fuse-bridge.c:1967:fuse_create_cbk] 0-glusterfs-fuse: 199971: /.gfid/8622fb7d-8909-42de-adb5-c67ed6f006c0 => -1 (Operation not permitted)
Please check whether selinux is enabled in both Master/Slave..I remember seeing such errors if selinux enabled.


this is found only on gluster-wien-03-int which is in 'Hybrid Crawl' :
[2015-12-16 17:17:07.219939] W [fuse-bridge.c:1261:fuse_err_cbk] 0-glusterfs-fuse: 123841: SETXATTR() /.gfid/00000000-0000-0000-0000-000000000001 => -1 (File exists) [2015-12-16 17:17:07.220658] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-3: remote operation failed: File exists. Path: /2301 [2015-12-16 17:17:07.220702] W [client-rpc-fops.c:306:client3_3_mkdir_cbk] 0-aut-wien-01-client-2: remote operation failed: File exists. Path: /2301

Some errors like "file exists" can be ignored.

But first of all i would like to have a look at this message, found about 6000 times on gluster-wien-05-int and ~07-int which are in 'History Crawl': [2015-12-16 13:03:25.658359] W [fuse-bridge.c:483:fuse_entry_cbk] 0-glusterfs-fuse: 119569: LOOKUP() /.gfid/d4815ee4-3348-4105-9136-d0219d956ed8/.dstXXXfDyaP9 => -1 (Stale file handle)

The gfid d4815ee4-3348-4105-9136-d0219d956ed8 1050="d4815ee4-3348-4105-9136-d0219d956ed8" belongs as shown to the folder 1050 in the brick-directory.

any brick in the master volume looks like this one ...:
Host : gluster-ger-ber-12-int
# file: gluster-export/1050
trusted.afr.dirty=0x000000000000000000000000
trusted.afr.ger-ber-01-client-0=0x000000000000000000000000
trusted.afr.ger-ber-01-client-1=0x000000000000000000000000
trusted.afr.ger-ber-01-client-2=0x000000000000000000000000
trusted.afr.ger-ber-01-client-3=0x000000000000000000000000
trusted.gfid=0xd4815ee4334841059136d0219d956ed8
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.1c31dc4d-7ee3-423b-8577-c7b0ce2e356a.stime=0x56606290000c7e4e trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x567428e000042116
trusted.glusterfs.dht=0x000000010000000055555555aaaaaaa9

on the slave volume just the brick of wien-02 and wien-03 have the same trusted.gfid
Host : gluster-wien-03
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-0=0x000000000000000000000000
trusted.afr.aut-wien-01-client-1=0x000000000000000000000000
trusted.gfid=0xd4815ee4334841059136d0219d956ed8
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x00000001000000000000000055555554

all nodes in 'History Crawl' haven't this trusted.gfid assigned.
Host : gluster-wien-05
# file: gluster-export/1050
trusted.afr.aut-wien-01-client-2=0x000000000000000000000000
trusted.afr.aut-wien-01-client-3=0x000000000000000000000000
trusted.glusterfs.6a071cfa-b150-4f0b-b1ed-96ab5d4bd671.xtime=0x5638bfb5000379c0
trusted.glusterfs.dht=0x0000000100000000aaaaaaaaffffffff

I'm not sure if it is normal or if that trusted.gfid should have been assigned on all slave nodes by the slave-upgrade.sh script.

As per the doc, it applies gfid on all slave nodes.

bash slave-upgrade.sh localhost:<aut-wien-01> /tmp/master_gfid_file.txt $PWD/gsync-sync-gfid was running on wien-02 which has password less login for any other slave node. as i could see in the process list slave-upgrade.sh was running on each slave node and starts as far as i can remember with a 'rm -rf ~/.glusterfs/...' so the mentioned gfid should disappeared by the slave-upgrade.sh but should the trusted.gfid also be re-assigned by the script ?
...I'm confused,
is the 'Stale file handle' message based on the missing trusted.gfid for /gluster-export/1050/ on the nodes where the message appears ? does it make sense to geo-rep and to start the slave-upgrade.sh script on the affected nodes without having access to the other nodes to fix this ?

currently I'm not sure if the 'stale file handle' messages prevent us from getting a running geo-replication but i guess best way is trying to get it running step by step...
any help is appreciated.

best regards
dietmar



[ 14:45:42 ] - root@gluster-ger-ber-07 /var/log/glusterfs/geo-replication/ger-ber-01 $gluster volume geo-replication ger-ber-01 gluster-wien-02::aut-wien-01 status detail

MASTER NODE MASTER VOL MASTER BRICK SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS FILES SYNCD FILES PENDING BYTES PENDING DELETES PENDING FILES SKIPPED ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- gluster-ger-ber-07 ger-ber-01 /gluster-export gluster-wien-07-int::aut-wien-01 Active N/A History Crawl -6500 0 0 5 6500 gluster-ger-ber-12 ger-ber-01 /gluster-export gluster-wien-06-int::aut-wien-01 Passive N/A N/A 0 0 0 0 0 gluster-ger-ber-11 ger-ber-01 /gluster-export gluster-wien-03-int::aut-wien-01 Active N/A Hybrid Crawl 0 8191 0 0 0 gluster-ger-ber-09 ger-ber-01 /gluster-export gluster-wien-05-int::aut-wien-01 Active N/A History Crawl -5792 0 0 0 5793 gluster-ger-ber-10 ger-ber-01 /gluster-export gluster-wien-02-int::aut-wien-01 Passive N/A N/A 0 0 0 0 0 gluster-ger-ber-08 ger-ber-01 /gluster-export gluster-wien-04-int::aut-wien-01 Passive N/A N/A 0 0 0 0 0 [ 14:45:46 ] - root@gluster-ger-ber-07 /var/log/glusterfs/geo-replication/ger-ber-01 $


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux