On 09/10/2014 03:27 AM, Paul Guo wrote:
Hello, Recently I spent a bit time understanding rebalance since I want to know its performance given that there could be more and more bricks to be added into my glusterfs volume and there will be more and more files and directories in the existing glusterfs volume. During the test I saw something which I'm really confused about. Steps: SW versions: glusterfs 3.4.4 + centos 6.5 Inital Configuration: replica 2, lab1:/brick1 + lab2:/brick1 fuse_mount it on /mnt cp -rf /sbin /mnt (~300+ files under /sbin) add two more bricks: lab1:/brick2 + lab2:/brick2. run gluster reblance. 1) fix-layout only (e.g. gluster volume rebalance g1 fix-layout start) After rebalance is done (observed via "gluster volume rebalance g1 status"), I found there is no file under lab1:/brick2/sbin. The hash ranges of new bricklab1:/brick2/sbin and old brick lab1:/brick1/sbin appear to be ok. [root@lab1 Desktop]# getfattr -dm. -e hex /brick2/sbin getfattr: Removing leading '/' from absolute path names # file: brick2/sbin trusted.gfid=0x35976c2034d24dc2b0639fde18de007d trusted.glusterfs.dht=0x00000001000000007fffffffffffffff [root@lab1 Desktop]# getfattr -dm. -e hex /brick1/sbin getfattr: Removing leading '/' from absolute path names # file: brick1/sbin trusted.gfid=0x35976c2034d24dc2b0639fde18de007d trusted.glusterfs.dht=0x0000000100000000000000007ffffffe The question is: AFAIK, fix-layout would create "linkto" files (files with "linkto" xattr and with sticky bit set only) for those ones whose hash values belong to the new subvol. so there should have been some "linkto" files under lab1:/brick2, but no one now, why?
fix-layout only fixes the layout, i.e spreads the layout to the newer bricks (or bricks previously not participating in the layout). It would not create the linkto files.
Post fix-layout, if one were to perform a lookup on a file, that should have belonged to the newer brick as per the layout and hash of that file name, one can see the linkto file being present.
Hope this explains (1).
2) fix-layout + data_migrate (e.g. gluster volume rebalance g1 start) After migration is done, I saw linkto files under brick2/sbin. There are totally 300+ files under system /sbin. Under brick2/sbin, I found the 300+ files are all there! either migrated or linkto-ed. -rwxr-xr-x 2 root root 17400 Sep 10 12:02 vmcore-dmesg ---------T 2 root root 0 Sep 10 12:03 weak-modules ---------T 2 root root 0 Sep 10 12:03 wipefs -rwxr-xr-x 2 root root 295656 Sep 10 12:02 xfsdump -rwxr-xr-x 2 root root 510000 Sep 10 12:02 xfs_repair -rwxr-xr-x 2 root root 348088 Sep 10 12:02 xfsrestore And under brick1/sbin, those migrated files are gone as expected. There are near to 150 files under brick/sbin. This confuses me since creating those linkto files seems to be unnecessary, at least for files whose hash values do not belong to the subvol. (My understanding is that if a file's hash value is in the range of a subvol then it will be stored in that subvol.)
Can you check if a lookup of the file post rebalance clears up these _stale_ linkto files?
How did you compute the hash of these files and decide that they do not belong to the new brick (i.e brick2)? I did them on my end and you are right (based on the layout you presented above), but I am curious as to how you arrived at the same conclusion.
Rebalance could choose to not move files but just create the linkto files based on space usage between the source and target bricks etc. Not stating this is what happened here, but a possibility.
I quickly looked at the code. gf_defrag_start_crawl() appears to be the function for this operation. I do see code that does file migration from the code path, but debugging code shows that those "linkto" files seem to be not created by gf_defrag_start_crawl(). I'm not that familar with the code detail and the theory so I'm not sure who created those "linkto" files and why the "linkto" file are created.
I am going to leave this part as, dht_linkfile_create does this and mostly would happen during lookup.
Shyam _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users