On Sat, Dec 19, 2009 at 2:23 PM, Thomas Johnson <thomas.j.johnson@xxxxxxxxx> wrote: > Hello group, > > I've been using git on a few different projects over the last couple of months, > and as a former svn user I really like it. Recently, I've been using it as an > 'electronic lab notebook' for an empirical project. My workflow looks like this: > 1. Start with the stable code base on head > 2. Create and change to branch 'Experiment123' > 3. Make some changes > 4. Run the program, which generates a giant (10MB-4G) output text file, > Experiment123.log. Update my LabNotebook.txt file. > 5. Were the new changes helpful? > 5.yes: Bzip Experiment123.log, and commit it on the branch. Merge the > Experiment123 branch to head and goto 1. > 5.no: Bzip Experiment123.log, and commit it on the branch. Merge LabNotebook.txt > and Experiment123.log back to head. Switch back to head and goto 1. > > The thing is, Experiment123.log is going to be very similar to Experiment122.log > and Experiment124.log except for a few details. My understanding is that git is > great at compressing groups of files like this, is that correct? Should I not be > bzipping them myself? On the other hand, I don't want HEAD to contain hundreds > of gigs of uncompressed files that bzip down to only a few hundred megs. > > Any thoughts on the workflow itself would also be very welcome. I have used myself such a similar workflow for parametric studies on some genetic algorithms, and below are my observations related to your question: * saving the entire log file (either zipped or not) in the repository has some drawbacks with repository clonning; (in my setup I've runned the tests in parallel on a different machine, and used Git to synchronize between the development machine and the test machine;) the problem lies in the fact that when I wanted to "clean" the test machine and start over I had to clone the repository, which also held all the unneeded log files; * (actually I've used two Git repositories -- one for the actual source code where I make the commits by hand, and another one which I use for the synchronization;) * even if you prefer having the logs, it's best to let Git handle the compression; because even if only some small parts change from the original txt file, I would guess that the BZip-ped file looks quite different; * maybe it would be better than instead of holding the experiment log, you just keep a sumarization of it (only the important stuff); and even if you do need the entire log, you could always recreate it by running the code again; (this was the road I took in the end, by keeping a small SQLite database of each experiment;) * (and of course there is also another little trick I've used: just put the logs file in a `log` directory which is "git-ignored", that way you can switch between branches, but Git won't touch the `log` directory, unless you force it by issuing `git clean -f -d -x`;) Hope I've been useful, Ciprian. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html