On Tue, 12 May 2009 00:53:06 -0700, Liam Slusser <lslusser@xxxxxxxxx> wrote: >>> Even with manually fixing (adding or removing) the extended attributes i >>> was never able to get Gluster to see the missing files. So i ended up >>> writing a quick program that searched the raw bricks filesystem and then >>> checked to make sure the file existed in the Gluster cluster and if it >>> didn't it would tag the file. Once that job was done i shut down >>> Gluster, >>> moved all the missing files off the raw bricks into temp storage, and >>> then i >>> restarted Gluster and copied all the files back into each directory. >>> That fixed the missing file problems. >>> >>> Id still like to find out why Gluster would ignore certain files without >>> the correct attributes. Even removing all the file attributes wouldn't >>> fix >>> the problem. I also tried manually coping a file into a brick which it >>> still wouldn't find. It would be nice to be able to manual copy files >>> into >>> a brick, then set an extended attribute flag which would cause gluster >>> to >>> see the new file(s) and copy them to all bricks after a ls -alR was >>> done. >>> Or even better just do it automatically when new files without >>> attributes are found in a brick. >>> >> >> It sounds like you are experiencing this known yet dangerous bug: >> http://gluster.org/docs/index.php/Understanding_AFR_Translator#Known_Issues >> >> Quote: >> Self-heal of a file that does not exist on the first subvolume: >> If a file does not exist on the first subvolume but exists on some other >> subvolume, it will not show up in the output of 'ls'. This is because the >> replicate translator fetches the directory listing only from the first >> subvolume. Thus, the file that does not exist on the first subvolume is >> never seen and never healed. However, if you know the name of the file >> and >> do a 'stat' on the file or try to access it in any other way, the file >> will be properly healed and created on the first subvolume. > > Interesting. Thanks for replying. Yeah this does sound like the bug. > However i was not able to stat or access the file what-so-ever. It always > replied with "file not found" and nothing in the logs. Could this be > caused because the whole directory is missing on the first volume? Sounds plausible. There are also other, more subtle issues still present in AFR/Replicate (e.g. BerkeleyDB doesn't work at all, and SQLite sort of works more often than not, but it's very twitchy). I wouldn't deploy it into a production environment as it is at the moment. Gordan