Re: Versioning

Fred van Zwieten <fvzwieten@xxxxxxxxxxxxx> · Thu, 2 Aug 2012 17:12:21 +0200

On Wed, Jul 25, 2012 at 11:20 PM, Fred van Zwieten <fvzwieten@xxxxxxxxxxxxx> wrote:

"Now I am leaning towards git based versioning. Integrate git into

GlusterFS to track changes on specified events (timer, file-close,

dir-tree-modify..). We may not do this via translator interface, but

through the newly proposed simple event/timer interface. "

I am not sure I would like that. Our idea is to make the 
previous versions (read-only!) available to the end-users through a 
separate mount-point, taking file permissions into account. I am not 
sure if that is at all possible when they live inside a git repository.

(disclaimer: I do not know the inner workings of glusterfs 
nor translators) I would think making it part (of the receiving part) 
of geo-replicator translator would be ideal because it knows what 
is going on. If a file /a/b/c is updated it's previous version could 
be stored as /pre/a/b/c.<datetime> or /pre/<datetime>/a/b/c.
 If the previous versions live on the same file-system you could even 
play with inodes to keep only the previous versions of blocks. This 
would make it very space efficient (sort of file based snapshotting).

I do agree that using git makes it more modular and independent of the geo-replicator translator.

I am also curious how you would handle multiple writes in a short 
time to the same file without ending up with an equal amount of 
previous versions.

Also, I can't find the note you are referring to. Could you please make a feature wiki page using the template?

Fred

We broke 
GeoReplication into two parts: (1) Marker - change tracking translator 
and (2) a simple queue - query changes and invoke rsync with specific 
list of files over ssh.  Unlike inotify, marker framework keeps track of
 changes with in the filesystem as xtime in extended attributes. You can
 ask the filesystem to list all changed files and folders since a 
particular point in time. This way, external service can tolerate crash,
 WAN link failure, etc.  Marker allows developers to extend storage 
capabilities using simple application programming model (even scripting 
languages are OK).

If certain tasks can be achieved outside of a 
translator, it is good to do so. Just like kernel mode , translator mode
 has some limitations. Translator code has to be efficient, asynchronous
 (event driven), latency sensitive and free of memory leaks.  If we 
extend the marker framework idea into generic event hook mechanism, we 
can develop powerful storage applications outside of the translator 
mechanism. Say you register your tool or script for certain events. When
 the event occurs, your code gets invoked with necessary parameters. You
 could then operate on the mounted filesystem itself, just as any other 
application. For example, you register a git script for invocation on a 
event say "when ever a registered directory tree is modified and time 
elapsed more than 30 mins". All this script does is, push changes to 
external origin. It is crude and simple, but achieves the goal.  Simple 
is better. You may also develop anti-virus plugins or silent data 
corruption checks using this technique. Users can use simple git 
checkout for flip views. Because git doesn't scale for large content, 
you can limit users to explicitly register interested folders for 
versioning. If you want to create a mountable of remote content, you can
 write a translator to  trap chdir or lookup for a directories named 
after timestamp and perform git checkout. If I use git for continuous 
automated file system versioning,  I will suggest users to use git tool 
itself as the UI.

I am just giving you tips and suggestions. Don't limit your ideas any way.

If I am guessing your idea correctly, it will have few limitations, but can live with it.

 * Only files are versioned. Directories are not.
 * File renames and Directory renames (mv) are not supported.
 * Every version is a complete duplicate copy (not as COW or WAFL).

 * Changes are tracked at per file level. Changes across a directory 
tree are not grouped. I mean cvs style, not like git as a patch set.

It
 is actually OK to make duplicate copies of changed files. In reality, 
for most practical use cases, very few files across the name space gets 
modified. Most of the files are written once and rarely modified. Files 
older than 30 days are hardly accessed. So it is OK to store duplicate 
copies of just the changed files. btrfs or device-mapper dedup may may 
take care of this as well. I won't worry too much about duplicating 
data, given its very small proportion. 
I didn't quite understand how you can play with inodes to avoid this duplication. Did you mean btrfs dedup like capability?.

If
 you want to avoid these limitations, think about rdiff-backup style 
continuous automated backup. Just like georep, you monitor the 
filesystem for changes and backup on a continuous basis. It is OK to 
give users a tool or API to restore/view older versions. This is much 
simpler to implement than WAFL or COW style storage format and file 
level snapshoting. 

Anand,

These are all "design" decisions that we do not need and even make the product less usefull in our use-case.

We have a large archive of tiff files. Every tiff file is large (50+ mb). The images themselves do not get modified, but their EXIF metadata does. There are also file renames and they get re-arranged into different directory structures. For this archive we need scalable filesystem with georep to second location _and_ file versioning.

"Because git doesn't scale for large content, you can limit users to explicitly register interested folders for versioning" 

Now, it seems to me git does not fit this bill, because it doesn't scale very well.

"* File renames and Directory renames (mv) are not supported"

If you mean building up retention on file renames and moves i agree for our use-case, but other might need it. Look at backuppc for a cool solution on that.

"* Every version is a complete duplicate copy (not as COW or WAFL)."

The fact that each version is a complete duplicate is not very storage friendly, because in out use-case only the EXIF metadata changes. I seek rdiff-backup like functionality there.

"It
 is actually OK to make duplicate copies of changed files. In reality, 
for most practical use cases, very few files across the name space gets 
modified. Most of the files are written once and rarely modified. Files 
older than 30 days are hardly accessed. So it is OK to store duplicate 
copies of just the changed files. btrfs or device-mapper dedup may may 
take care of this as well. I won't worry too much about duplicating 
data, given its very small proportion. "

I do not agree with you. If you say most of the files are written once and rarely modifed you are narrowing the usecase for glusterfs. You are describing near worm. Out use-case is not that. Also, our files also get modified after 30 days. Relying on dedup on the lower fs level is also not good. Suppose you have a 200TB filesystem. That would take post-proces dedup a very long time to find the dups. Better to do it inline. Again, look a backuppc for an implementation example.

Fred