--- On Wed, 7/30/08, Łukasz Osipiuk <lukasz@xxxxxxxxxxx> wrote: >> Step1: Client1: cp test_file.txt /mnt/gluster/ >> Step2: Brick1 and Brick4: has test_file.txt in >> /mnt/gluster/ directory >> Sept3: Client1: ls /mnt/gluster - test_file.txt is present >> >> Step4: Brick1: rm /mnt/gluster/test_file.txt >> Step5. Client1: cat /mnt/gluster/test_file.txt -> we will get contents of file from brick4 >> >> Step6. Brick1 ls /home/export is empty. Selfheal >> not recovered file. > > > > I suspect that this is normal, you are not suppose to > > modify the bricks manually from underneath AFR. AFR uses > > extended attributes to keep file version metadata. When you > > manually deleted the file in step4 the directory version > > metadata should not have been updated so I suspect that caused the > > mismatch to go undetected. The self heal would have occurred > > if the brick node were down and the file was deleted by > > client and> then the brick node returned to operation. > > > > -Martin > > ------ > > Martin. It is obvious that one normally should not modify > AFR backend directly. The experiment Tomáš (and me also) > made, was a simulation of reallife problem when you > loose some data on one of data bricks. I understand, I am not sure that AFR is equipped to handle all of these types of failures, some of them, yes, but not all. Mostly, the versionning mechanisms are aimed at healing from network/node outages, but not from disk corruptions. If you want this, you will probably have to put raid under your local filesystems. Although, someone did mention in a post a while ago an alpha-stage translator that will do checksumming on a local device. > The more extreme example is: on of data bricks explodes and > You replace it with new one, configured as one which gone off > but with empty HD. This is the same as above > experiment but all data is gone, not just one file. AFR should actually handle this case fine. When you install a new brick and it is empty, there will be no metadata for any files or directories on it so it will self(lazy) heal. The problem that you described above occurs because you have metadata saying that your files (directory actually) is up to date, but the directory is not since it was modified manually under the hood. AFR cannot detect this (yet), it trusts its metadata. > Is there a way to make GlusterFS "heal" so the > new node contains replicated data from its mirror? In your case, yes, if you either delete the attributes for the out of date files/directories or set them to a lower version than its peer, it should heal on the next find/access. > I tried the find-head pattern but it doesn't help :( See above, AFR does not know they are out of date, so this won't help. It does seem like it would be fairly easy to add another metadata attribute to each file/directory that would hold a checksum for it. This way, AFR itself could be configured to check/compute the checksum anytime the file is read/written. Since this would slow AFR down, I would suggest a configuration option to turn this on. If the checksum is wrong, it could heal to the version of the other brick if the other brick's checksum is correct. Another alternative would be to create an offline checksummer that updates such an attribute if it does not exist, and checks the checksum if it does exist. If when it checks the checksum it fails, it would simply delete the file and its attributes (and potentially the directory attributes up the tree) so that AFR will then heal it. The only modification needed by AFR to support this would be to delete the checksum attribute anytime the file/directory is updated so that the offline checksummer will recreate it instead of thinking it is corrupt. In fact, even this could be eliminated so that the offline checksummer is completely "self-powered", anytime it calculates a checksum it could copy the glusterfs version and timestamp attributes to two new "checksummer" attributes. If these become out of date the cheksummer will know to recompute the checksum instead of assuming that the file has been corrupted. The one risk with this is that if a file gets corrupted on both nodes, it will get deleted on both nodes so you will not have a corrupted file to at least look at. This too could be overcome by saving any deleted files in a separate "trash can" and cleaning the trash can once the files in it have been healed, sort of a self cleaning lost+found directory. I know this may not be the answers that you were looking for, but I hope it helps clarify things a little. -Martin