Anand Avati wrote:
When the server that dropped out reconnects, it's version will be higher
than the new (reset) version, and it's old file will clobber the new file.
This is not entirely correct. This can *potentially* happen if the two
clients (the client who created the first file and the client which
re-created the file) are out of sync in time. AFR keeps the client system
create time in the xattr and uses that as a major version number (the other
thread discusses changing this major number to be equal to the parent dir
minor number).
If time(system1) - time(system2) < wall_time(file1) - wall_time(file2) then
there is no data corruption.
Again we come back to the issue of the servers needing their time
synced. I keep thinking that some way to provide a stabilized agreed
upon time for all cluster members would do away with this problem. NTP
might just not be as good a solution as something provided in cluster,
because then AFR members could decide to not participate if they
couldn't agree and sync to the cluster time.
Also, to what precision is the time saved? I can imagine modify, move
and create operations happening in quite a short time period. Short
enough to be sub second if that's the only precision being kept, or
possibly within the drift of ntp synced systems for sub second precision.
--
-Kevan Benson
-A-1 Networks