This is a resend of previous mail reformatted. (Unfortunately even university staff acounts these days seem to only give you unconfigurable web-browser interfaces; hopefully gmail won't butcher the 72-character wrapped layout I'm seeing on my screen right now too much.) Many thanks for your forbearance and any insight. Original message<<<<<<<<<<<<<<<<<<<<<<<< Hi, I'm working on using git for chronological backups (think selective P9 venti), which I've almost got working. (I know I'm using git for something it wasn't designed for, and is arguably perverse.) I've been trying to figure out why files which "ought" to be tracked weren't in the database in certain commits and I think I've figured out why. Can someone confirm the following: with the set of operations git-init-db .... git-add path/to/fileX git-commit -a -m blah <1> ..... changes to things including fileX and commits [fileX vanishes from the tree being tracked, but nothing mentioned to git] <2> ..... changes to things and commits [file X reappears in the working tree being tracked] <3> git-commit -a -m blah <4> The git trees from <1> to <2> all contain fileX, even if its contents haven't changed. Between <2> and <4> the git tree doesn't contain fileX (perfectly properly). From <4> onwards fileX still doesn't appear in the git trees recorded from the working directory even though fileX is there again. If I want fileX tracked I have to explicitly git-add it again at <3>. (Ie, git-commit -a when it detects a file have vanished from the working tree removes it from the files git will look at in future for changes to commit.) Is it also correct that this behaviour ("forget" about a tracked file when it disappear from the working tree in a commit) would be difficult to change without major surgery to git? Would there be any problems with git-add'ing every file you want the tree to track before every commit? (I'm currently working on code to keep track of things that have ever been tracked, and whether they're currently in the tree, in my scripts outside git but obviously partially duplicating stuff git has in its datastructures has the potential for subtle bugs when they diverge.) Long story: --------------- I'm trying to move a snapshotting-style system from my personal hack-job to git. As well as manual snapshots there's a cron job that runs every hour to snapshot stuff. Consequently there will be "automatic commits" when you wouldn't have made one if you were doing normal source control, eg, after you've wrongly deleted a file and before you've noticed & restored it from the database an automatic commit can come in (and even more kinky situations you don't want to know about) and so sees the file "gone". Sidenote: I'm moving the database from the old format to the new one by repeatedly unpacking the old database for snapshot X, git-add'ing any file names which have _never_ been in any snapshot before, git-commit -a, git-tag, then remove all the files unpacked by the old database and move onto snapshot X+1. This takes less than a second per snapshot. I understand this shouldn't be a problem but just to let people know the timestamps on the files aren't what would be expected. Follow up clarification msg <<<<<<<<<<<<<<<<<<<<<<<< |Not sure how large your snapshots are -- a |second sounds like a long time for git operations. While it is a bit |more complex, you _can_ operate directly on the index, and the |"snapshot" never needs to hit the disk as such during your migration. By trying to be brief I was a rather cryptic. What I was trying to say was: Running the git commands earlier in the message in a script, I see that file are not present from the git tree generated by a commit at a time when I know the file I'd previously git-added reappeared in the working directory. I'm hypothesising that this is because when the file disappears the machinery in git loses the `track this file' information. However, I haven't (and would prefer not to) dig into the git code to check that's the correct explanation. If this is why the files aren't being tracked I can try to script around the issue by git-adding all the files I want tracked by the snapshot before the git-commit -a. To help anyone thinking about if the explanation is right, the working directory is repeatedly being wiped and refilled from my old backup system with a second, so often all files have both creation and modification times set to the current second. This is a really weird thing to do and might in some way be responsible for the untracked file (cf racy-git)." Most of the maybe half-second overhead is coming from my script unpacking the files with gzip from my old database; git seems more than fast enough. |Have a look at how the cvsimport script works for an example. As its my personal db which I'll only convert once if I can just make replaying work I don't need anything more complicated; I've only got 2000-odd snapshots. -- cheers, dave tweed__________________________ david.tweed@xxxxxxxxx Rm 124, School of Systems Engineering, University of Reading. Details are all that matters; God dwells there, and you never get to see Him if you don't struggle to get them right. -- Stephen Jay Gould - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html