On 1/7/13 8:06 PM, Stephan von Krawczynski wrote: > Joe Julian <joe at julianfamily.org> wrote: >> Your app wants to append to the file again. It calls stat on the file. >> Brick2 answers first stating that the file is 4k long. Your app seeks to >> 4k and writes. Now the data you wrote before is gone. > > Forgive my ignorance, but it obvious that this implementation of a stat on a > replicating fs is shit. Of course a stat should await _all_ returning local > stats and should choose the stat of the _latest_ file version and note that > the file needs self heal. Ignorance is fine, but your rudeness is (still) unwelcome. If O_APPEND is set, that is passed through so we don't need a stat at all to ensure that data is written at EOF. If you actually do a stat/write combo without O_APPEND as Joe suggests, then the there's an inherent race between those two separate operations and the neither doing the stat on all replicas nor anything else in POSIX (other than locking) will avoid it. Your "obvious" answer is wrong. > self-heal is no answer to this question. The only valid answer is choosing the > _latest_ file version no matter if self heal is necessary or not. Timestamps are totally unreliable as a conflict resolution mechanism. Even if one were to accept the dependency on time synchronization, there's still the possibility of drift as yet uncorrected by the synchronization protocol. The change logs used by self heal are the *only* viable solution here. If you want to participate constructively, we could have a discussion about how those change logs should be set and checked, and whether a brick should be allowed to respond to requests for a file between coming up and completion of at least one self-heal check (Mario's example would be a good one to follow), but insisting on even less reliable methods isn't going to help.