On Wed, 30 Apr 2008, Martin Fick wrote:
No, no need to keep the old data around. We only need to remember the start and span of each changed section along with the file version of the change! This is much esier/space efficient than snapshots. Excuse me for being ignorant of the actual sizes of these three parameters, but they can't be larger than 8 bytes each, can they? 8*3 = 24 bytes. A 100MB journal filesystem could store almost 50 thousand different file changes!
OK, you're right. I follow what you mean now. Not keeping all versions, just keep the (version,start,finish) pointers in the log. That way a client can see what happened since it's version and only sync the blocks listed, and if the log has rolled over, then (r)sync the whole file. Sounds like a good idea. The next question is where to keep the log. 1 log per file? 1 log per directory? How to store them? Shadow files? Separate shadow volume? A shadow volume might be a good idea because it keeps the main source mounted directory exactly the same as a normal directory. A (shadow volume) log should, ideally, also keep additional sanity check information such as file metadata (timestamps, size) for cross-check of whether something went weird and the file was changed underneath GlusterFS, and if it has, flush out the log and force a full resync on the file.
Gordan