Hi, Noah Silverman wrote: > I'm looking for both a version control system and backup system. I am fond of this question. :) > I guess, that I need just keep some files backed up (and/or synced) as > they're not "working projects". I will add new documents and > occasionally edit others, but no real need for versioning. I suggest rsync or unison[1], and to use btrfs locally if you want snapshots. I don’t know a good tool for shared snapshots, but that is probably my ignorance. In my humble opinion, tools designed for tracking source code, like git and bzr, are not appropriate for this task. To illustrate this, I have put some thoughts about how to cheat git into doing an okay job in a footnote[4]. > Other files > are working projects (possible with collaboration) and need active VCS. In very small projects, I believe any free DVCS will do. What tools are you and your collaborators already comfortable with? I hear it can be hard to unlearn habits from using Subversion when getting started with Git. Some other version control systems cater to that transition better. As projects scale in size, the speed differences between version control systems start to matter. I find myself making larger commits, looking through history less, and checking email more often when using certain systems. > From what I have read, I will > effectively have multiple copies of each item on my hard drive, thus > eating up a lot of space (One of the "working file"and several in the > .git directory.) If I have multiple changes to a file, then I have > several full versions of it on my machine. If your files are relatively compressible (or at least rsyncable) and you pack your the repository occasionally, this should not be a problem. The relevant page[2] of the Pro Git book tells probably more than you wanted to know about this. Short summary: each file is initially stored in the .git directory as a compressed file named after its content. When asked to pack with the "git gc"[3] command (or automatically if there are too many unpacked objects around), git puts the data into a larger "pack file", this time as a delta against some suitable similar blob. For source code (which is already rather compressible), this tends to work well. My local git/.git object repository is about 2½ times the size of the working copy. > This could be a problem for > a directory with 100GB or more, especially on a laptop with limited hard > drive space. Yes. Actually, this point is why I replied. Using a source code management system as a backup system generally implies this weird assumption that even the oldest revisions are always worth keeping. With big, machine-generated files, that doesn’t make sense to me --- it is better to be able to throw away some snapshots when you are running low on space. > 2) Sub-directory selection. On my laptop, I only want a few > sub-directories to be synced up. I don't need my whole document tree, > but just a few directories of things I work on. It requires foresight, but you could use a separate filesystem for this (possibly loop-mounted) if you want to keep snapshots. With some symlinks, this would not require changing the directory structure. > Any and all suggestions are welcome and appreciated. Thanks for the food for thought. Jonathan [1] http://www.cis.upenn.edu/~bcpierce/unison/ [2] http://progit.org/book/ch9-4.html [3] http://www.kernel.org/pub/software/scm/git/docs/git-gc.html [4] So, you want to use git as a general backup tool? . Files should be compressible. Set appropriate attributes. Use clean and smudge filters[5] to replace the weird working-copy representation with a simpler tracked form. Use !delta[6] where appropriate so git knows not to waste its time. . Files should be conducive to de-duplication. Cut large files into slices using rsync’s rolling checksum algorithm[7]. . Backups should be fault-tolerant. Use par2[8] or zfec[9] to protect pack files, maybe. . Sometimes metadata (file owners and modes) is important. Track a "restore" script that sets the appropriate metadata, and update it before each commit[10]. . Files should not change as git reads them (or it will error out). Wait for a quiescent state to backup, or make a snapshot some other way and ask git to back up that. . Old revisions are not precious. It would be nice to be able to decide when each backed-up tree can expire. My best suggestion is to rely on reflogs[11] instead of the revision graph to represent your history so old versions can expire, but getting this to work nicely would take some work: there is no built-in mechanism to transfer reflogs and associated objects to another repository, for example. [5] http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_tt_filter_tt [6] http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html#_tt_delta_tt [7] http://github.com/apenwarr/bup [8] http://parchive.sourceforge.net/ [9] http://allmydata.org/trac/zfec [10] http://kitenet.net/~joey/code/etckeeper/ [11] http://www.kernel.org/pub/software/scm/git/docs/git-reflog.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html