On Sun, Feb 10, 2013 at 11:33:26AM +0100, Thomas Koch wrote: > thank you very much for your idea! It's good and simple. It just breaks down > for the case when a large folder got renamed. Yes, it would never find renames, which a true sha1->path map could. > But I already hacked the basic layout of the algorithm and it's not > complicated at all, I believe: > > https://github.com/thkoch2001/git_export_hardlinks/blob/master/git_export_hardlinks.py It looks like you create the sha1->path mapping by asking the user to provide <tree_sha1>,<path> pairs, and then assuming that the exported tree at <path> exactly matches <tree_sha1>. Which it would in the workflow you've proposed, but it is also easy for that not to be the case (e.g., somebody munges a file in <path> after it has been exported). So it's a bit dangerous as a general purpose tool, IMHO. It's also a slight pain in that you have to keep track of the tree sha1 for each exported path somehow. A safer and more convenient (but slightly less efficient) solution would be to keep a git index file for each exported tree. Then we can just refresh that index, which would check that our sha1 for each path is up to date (and in the common case of nothing changed, would only be as expensive as stat()-ing each entry). And then we use that index as the sha1->path map. The simplest way to have an index for each export would be to actually give each one its own git repo (which does not have to use much space, if you use "-s" to share the objects with the master repo). That's more complex, and uses more disk than what your script does, but I do think the added safety would be worth it for a general-purpose tool. > I had to interrupt work on this and could not yet finish and test it. But I > thought you might be interested. Maybe something like this might one day be > rewritten in C and become part of git core? I think if we had a `git export` command (and we do not, but there has been discussion in a nearby thread about whether such a thing might be a good idea), having a `--hard-link-from` option to link with other checkouts would make sense. It could also potentially be an option to git-checkout-index, and you could script around it at that low level. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html