Re: [Request] Git export with hardlinks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Feb 10, 2013 at 11:33:26AM +0100, Thomas Koch wrote:

> thank you very much for your idea! It's good and simple. It just breaks down 
> for the case when a large folder got renamed.

Yes, it would never find renames, which a true sha1->path map could.

> But I already hacked the basic layout of the algorithm and it's not 
> complicated at all, I believe:
> 
> https://github.com/thkoch2001/git_export_hardlinks/blob/master/git_export_hardlinks.py

It looks like you create the sha1->path mapping by asking the user to
provide <tree_sha1>,<path> pairs, and then assuming that the exported
tree at <path> exactly matches <tree_sha1>. Which it would in the
workflow you've proposed, but it is also easy for that not to be the
case (e.g., somebody munges a file in <path> after it has been
exported).

So it's a bit dangerous as a general purpose tool, IMHO. It's also a
slight pain in that you have to keep track of the tree sha1 for each
exported path somehow.

A safer and more convenient (but slightly less efficient) solution would
be to keep a git index file for each exported tree. Then we can just
refresh that index, which would check that our sha1 for each path is up
to date (and in the common case of nothing changed, would only be as
expensive as stat()-ing each entry). And then we use that index as the
sha1->path map.

The simplest way to have an index for each export would be to actually
give each one its own git repo (which does not have to use much space,
if you use "-s" to share the objects with the master repo).

That's more complex, and uses more disk than what your script does, but
I do think the added safety would be worth it for a general-purpose
tool.

> I had to interrupt work on this and could not yet finish and test it. But I 
> thought you might be interested. Maybe something like this might one day be 
> rewritten in C and become part of git core?

I think if we had a `git export` command (and we do not, but there has
been discussion in a nearby thread about whether such a thing might be a
good idea), having a `--hard-link-from` option to link with other
checkouts would make sense. It could also potentially be an option to
git-checkout-index, and you could script around it at that low level.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]