Daniel, what is the tla revision of your software? (seen from glusterfs --version) avati 2008/4/4, Daniel Maher <dma+gluster@xxxxxxxxx <dma%2Bgluster@xxxxxxxxx>>: > > > Hi all, > > While running a series of FFSB tests against my newly-created Gluster > cluster, i caused glusterfsd to crash on one of the two storage nodes. > The relevant lines from the log file are pastebin'd : > http://pastebin.ca/970831 > > > Even more troubling, is that when i restarted glusterfsd, the node > did /not/ self-heal : > > The mointpoint on the client : > [dfsA]# du -s /opt/gfs-mount/ > 2685304 /opt/gfs-mount/ > > The DS on the node which did not fail : > [dfsC]# du -s /opt/gfs-ds/ > 2685328 /opt/gfs-ds/ > > The DS on the node which failed, ~ 5 minutes after restarting > glusterfsd : > [dfsD]# du -s /opt/gfs-ds/ > 27092 /opt/gfs-ds/ > > > Even MORE troubling, i restarted glusterfsd on the node which did not > fail, to see if that would help - and it created even more bizarre > results : > > The mountpoint on the client : > [dfsA]# du -s /opt/gfs-mount/ > 17520 /opt/gfs-mount/ > > The DS on the node which did not fail : > [dfsC]# du -s /opt/gfs-ds/ > 2685328 /opt/gfs-ds/ > > The DS on the node which failed : > [dfsD]# du -s /opt/gfs-ds/ > 27092 /opt/gfs-ds/ > > > A simple visual inspection of the files and directories shows that > the files and directories are clearly different between the client and > between the two nodes. For example : > > (Client) > [dfsA]# ls fillfile* > fillfile0 fillfile11 fillfile14 fillfile2 fillfile5 fillfile8 > fillfile1 fillfile12 fillfile15 fillfile3 fillfile6 fillfile9 > fillfile10 fillfile13 fillfile16 fillfile4 fillfile7 > [dfsA]# ls -l fillfile? > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile0 > -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile1 > -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile2 > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile3 > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile4 > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile5 > -rwx------ 1 root root 0 2008-04-04 09:42 fillfile6 > -rwx------ 1 root root 0 2008-04-04 09:42 fillfile7 > -rwx------ 1 root root 196608 2008-04-04 09:42 fillfile8 > -rwx------ 1 root root 0 2008-04-04 09:42 fillfile9 > > (Node that didn't fail) > [dfsC]# ls fillfile* > fillfile0 fillfile13 fillfile18 fillfile22 fillfile4 fillfile9 > fillfile1 fillfile14 fillfile19 fillfile23 fillfile5 > fillfile10 fillfile15 fillfile2 fillfile24 fillfile6 > fillfile11 fillfile16 fillfile20 fillfile25 fillfile7 > fillfile12 fillfile17 fillfile21 fillfile3 fillfile8 > [dfsC]# ls -l fillfile? > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile0 > -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile1 > -rwx------ 1 root root 131072 2008-04-04 09:42 fillfile2 > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile3 > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile4 > -rwx------ 1 root root 65536 2008-04-04 09:42 fillfile5 > -rwx------ 1 root root 0 2008-04-04 09:42 fillfile6 > -rwx------ 1 root root 0 2008-04-04 09:42 fillfile7 > -rwx------ 1 root root 196608 2008-04-04 09:42 fillfile8 > -rwx------ 1 root root 0 2008-04-04 09:42 fillfile9 > > (Node that failed) > [dfsD]# ls fillfile* > fillfile0 fillfile11 fillfile14 fillfile2 fillfile5 fillfile8 > fillfile1 fillfile12 fillfile15 fillfile3 fillfile6 fillfile9 > fillfile10 fillfile13 fillfile16 fillfile4 fillfile7 > [dfsD]# ls -l fillfile? > -rwx------ 1 root root 65536 2008-04-04 09:08 fillfile0 > -rwx------ 1 root root 131072 2008-04-04 09:08 fillfile1 > -rwx------ 1 root root 4160139 2008-04-04 09:08 fillfile2 > -rwx------ 1 root root 327680 2008-04-04 09:08 fillfile3 > -rwx------ 1 root root 262144 2008-04-04 09:08 fillfile4 > -rwx------ 1 root root 65536 2008-04-04 09:08 fillfile5 > -rwx------ 1 root root 1196446 2008-04-04 09:08 fillfile6 > -rwx------ 1 root root 131072 2008-04-04 09:08 fillfile7 > -rwx------ 1 root root 3634506 2008-04-04 09:08 fillfile8 > -rwx------ 1 root root 131072 2008-04-04 09:08 fillfile9 > > > What the heck is going on here ? Three wildly different results - > that's really not a good thing. These results seem "permanent" as well > - after waiting a good 10 minutes (and executing the same du command a > few more times), the results are the same... > > > Finally, i edited "fillfile6" (0 bytes on dfsA and dfsC, 1196446 > bytes on dfsD) via the mountpoint on dfsA, and the changes were > immediately reflected on the storage nodes. Clearly the AFR translator > is operational /now/, but the enormous discrepancy is not a good thing, > to say the least. > > > > > -- > Daniel Maher <dma AT witbe.net> > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > -- If I traveled to the end of the rainbow As Dame Fortune did intend, Murphy would be there to tell me The pot's at the other end.