On Wed, 30 Apr 2008, Gareth Bult wrote:
Sorry, I'm trying to follow this but I'm coming a little unstuck ..
Am I right in thinking the rolling hash / rsync solution would involve
syncing the file "on open" as per the current system .. and in order to
do this, the server would have to read through the entire file in order
to create the hashes?
(indeed it would need to do this on two servers to create hashes for comparison?)
Yes.
So .. as a rough benchmark .. assume 50Mb/sec for a standard / modern
SATA drive, opening a crashed 20G file is going to take 400 seconds or
six minutes ... ? (which would also flatten two servers for the
duration)
It would certainly ber beneficial in the cases when the network speed is
slow (e.g. WAN replication).
Whereas a journal replay of 10M is going to take < 1s and be effectively transparent.
(I'm guessing this could also be done at open time ??)
Journal per se wouldn't work, because that implies fixed size and
write-ahead logging. What would be required here is more like the
snapshot style undo logging.
The problem with this is that you have to:
1) Categorically establish whether each server is connected and up to date
for the file being checked, and only log if the server has disconnected.
This involves overhead.
2) For each server that is down at the time, each other server would have
to start writing the snapshot style undo logs (which would have to be
per server) for all the files being changed. This effectively multiplies
the disk write-traffic by the number of offline servers on all the working
up to date servers.
The problem that arises then is that the fast(er) resyncs on small changes
come at the cost of massive slowdown in operation when you have multiple
downed servers. As the number of servers grows, this rapidly stops being a
workable solution.
Gordan