Re: [PATCH] don't use mmap() to hash files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 15 Feb 2010, Avery Pennarun wrote:

> - git-prune only prunes unpacked objects
> 
> - git-repack claims to be willing to explode unreachable objects back
> into loose objects with -A, but I'm not quite sure if its definition
> of "unreachable" is the same as mine.  

Unreachable means not referenced by the specified rev-list 
specification.  So if you give it --all --reflog then it means any 
objects that is not reachable through either your branches, tags or 
reflog entries.

> And I'm not sure rewriting a
> pack with -A makes the old pack reliably unreachable according to -d.

Reachability doesn't apply to packs.  That applies to objects.  And 
unreachable objects may be copied to loose objects with -A, or simply 
forgotten about with -a.  Then -d will literally delete the old pack 
file.

> - there seems to be no documented situation in which you can ever
> delete unused objects from a pack without using repack -a or -A, which
> can be amazingly slow if your packs are huge.  (Ideally you'd only
> repack the particular packs that you want to shrink.)  For example, my
> bup repo is currently 200 GB.

Ideally you don't keep volatile objects into huge packs.  That's why we 
have .keep to flag those packs that are huge and pure so not to touch 
them anymore.

Incremental repacking is there to gather only those _reachable_ loose 
objects into a new pack.  The objects that you're likely to make 
unreachable are probably going to come from a temporary branch that you 
deleted which is likely to affect objects only from that latest and 
small pack.

And repacking can be done unattended and in parallel to normal Git 
operations with no issues.  So even if it is slow to repack huge packs, 
it is something that you might do during the night and only once in a 
while.

But if you really want to shrink only one pack without touching the 
other packs, and you do know which objects have to be removed from that 
pack, then it is trivial to write a small script using git-show-index, 
sorting the output by offset, filter out the unwanted objects, keeping 
only the SHA1 column, and feeding the result into git-pack-objects.  Oh 
and delete the original pack when done of course.  It is also trivial to 
generate the list of all packed objects, compare it to the list of all 
reachable objects, and prune objects from the packs that contains those 
objects which are not to be found in the reachable object list.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]