On Tue, May 12, 2009 at 12:07 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
Liam Slusser wrote:It sounds like you are experiencing this known yet dangerous bug:
Even with manually fixing (adding or removing) the extended attributes i was never able to get Gluster to see the missing files. So i ended up writing a quick program that searched the raw bricks filesystem and then checked to make sure the file existed in the Gluster cluster and if it didn't it would tag the file. Once that job was done i shut down Gluster, moved all the missing files off the raw bricks into temp storage, and then i restarted Gluster and copied all the files back into each directory. That fixed the missing file problems.
Id still like to find out why Gluster would ignore certain files without the correct attributes. Even removing all the file attributes wouldn't fix the problem. I also tried manually coping a file into a brick which it still wouldn't find. It would be nice to be able to manual copy files into a brick, then set an extended attribute flag which would cause gluster to see the new file(s) and copy them to all bricks after a ls -alR was done. Or even better just do it automatically when new files without attributes are found in a brick.
http://gluster.org/docs/index.php/Understanding_AFR_Translator#Known_Issues
Quote:
Self-heal of a file that does not exist on the first subvolume:
If a file does not exist on the first subvolume but exists on some other subvolume, it will not show up in the output of 'ls'. This is because the replicate translator fetches the directory listing only from the first subvolume. Thus, the file that does not exist on the first subvolume is never seen and never healed. However, if you know the name of the file and do a 'stat' on the file or try to access it in any other way, the file will be properly healed and created on the first subvolume.
So, either the directory listing should be fetched from the read-subvolume, or better, fetched from all nodes (but that gets slow). At least if it was fetched from the read-subvolume, you could run a cron job on each server that ls -laR, which would force the files into sync (since each server probably has itself as the read-subvolume, so the missing files will be found). But that's not how it seems to work at the moment.
Gordan
Interesting. Thanks for replying. Yeah this does sound like the bug. However i was not able to stat or access the file what-so-ever. It always replied with "file not found" and nothing in the logs. Could this be caused because the whole directory is missing on the first volume?
I do have a test cluster i setup to test new versions/configurations so i think i can reproduce this scenario - if it would be of any use to anybody...
thanks,
liam