On Fri, 3 Apr 2009 13:34:29 +0200, nicolas prochazka <prochazka.nicolas@xxxxxxxxx> wrote: > It seems to be have a lot of problem with self healing and one it is > that glusterfs is using one server a reference ( the first in > subvolumes ) > ( afr_sh_select_source ? ) This brings up an interesting point - what is the conflict resolution supposed to be? The favorite-child option should be the resolution of last resort (i.e. the timestamp metadata is identical). The primary resolution should be, IIRC, the latest file wins. However, this poses potential problems. Consider this scenario: Primary crashes. We only have the secondary. Files on the secondary change while it is the only server. Primary comes back, but crashes again mid-sync. Next time it comes back, it has a partially synced file, and it's favorite-child, so unless the metadata (specifically timestamps) gets synced _last_, the partial synced file would clobber the whole file. Does the metadata get synced last? It's the only sane option as far as I can tell, but I've seen situations before where the timestamps on the new server get stuck to epoch (01-01-1970) after a (successful) resync. Can somebody point at a definitive spec document for how the AFR healing is _supposed_ to operate under various failure and resync scenarios? It currently seems to be in quite a dangerous state and nowhere nearly enough warnings are being made about it for something that can cause extensive data corruption/loss. If such a specification exist, then it should be pretty easy to create test cases for it. Speaking of which, is there a test-harness available for it? It would be really useful to be able to do something like "make test" before "make install". It would also encourage more technical users to add test cases for things they find broken. It would also provide a base-line for regressions, to make sure that something that worked is never broken in a later release. My perception is that the stability/bug-count has been getting progressively worse in all releases since rc1. Another thing - since files being in sync is such a problematic thing at the moment, how about md5 and last-sync-timestamp fields in the metadata for each file? This, coupled with an external cron job that computes/verifies/updates these that can run [daily|weekly|monthly] (depending on the amount of data) would at least provide a secondary sanity check to make sure file corruption/de-sync gets detected early and reliably. Not having such a thing is really just sticking one's head in the sand and ignoring the issue. Another thing - if a file is open for write, I think there should be a metadata flag set, and it should be unset when the last write handle is closed. When the server comes up, if there are any such flags are set before any write opens are received, then the file should be marked as crashes, and this file should explicitly be prevented from being the sync-source. There are a lot of error-resync use cases and it might be a good time for them to be enumerated and systematically tested against to minimize risk of data loss. Gordan