Here's a re-roll of the patch series I posted earlier to make "git prune" keep more contiguous chunks of the object graph. The cleanups to t5304 were spun off into their own series, and are dropped here. However, the other patches seem to have multiplied in number (I must have fed them after midnight). Here are the changes since the first round (thanks everybody for your comments): - fix bogus return values from freshen_file, foreach_alt_odb, and for_each_packed_object - make for_each_object_in_pack static - clarify commit message for "keep objects reachable from recent objects" patch (this was the one that confused Junio, and I elaborated based on our discussion) - clarify the definition of "loose object dirs" in the comment above for_each_loose_file_in_object_dir - in for_each_loose_file, traverse hashed loose object directories in numeric order, and pass the number to the subdir callback (this is used by prune-packed for its progress updates); as a side effect, this fixes the bugs Michael noticed with the subdir callback. - prune-packed now reuses the for_each_loose_file interface - use revs->ignore_missing_links so we don't barf on already-missing unreferenced objects - convert reachable.c to use traverse_commit_list instead of its own custom walk; this gives support for ignore_missing_links above, and saves us a fair bit of code. - while in the area, I noticed that reachable.c's reflog handling is the same as rev-list's --reflog option; it now builds on what's in revision.c. That takes us up to patch 17. While working in reachable.c, I noticed an oddity: we consider objects in the index to be reachable during prune (which is good), but we do not when dropping them during a repack that uses --unpack-unreachable=<expiration>. The remaining patches fix that, which needed a fair bit of preparatory cleanup. I'm really beginning to question whether the "just drop objects that are about to be pruned" optimization done in 7e52f56 (gc: do not explode objects which will be immediately pruned, 2012-04-07). It really complicates things as pack-objects and prune need to have the exact same rules (and implementing it naively, by having pack-objects run the same code as prune, is not desirable because pack-objects has _already_ done a complete expensive traversal to generate the packing list). And I fear it will get even worse if we implement some of the race-condition fixes that Michael suggested earlier. On the other hand, the loosening behavior without 7e52f56 has some severe pathological cases. A repository which has had a chunk of history deleted can easily increase in size several orders of magnitude due to loosening (since we lose the benefit of all deltas in the loosened objects). Finally, there are a few things that were discussed that I didn't address/fix. I don't think any of them is a critical blocker, but I did want to summarize the state: - when refreshing, we may update a pack's mtime multiple times. It probably wouldn't be too hard to cache this and only update once per program run, but I also don't think it's that big a deal in practice. - We will munge mtimes of objects found in alternates. If we don't have write access to the alternate, we'll create a local duplicate of the object. This is the safer thing, but I'm not sure if there are cases where we might try to write out a large number of objects which exist in an alternate (OTOH, we will eventually drop them at the next repack). - I didn't implement the "sort by inode" trick that fsck does when traversing the loose objects. It wouldn't be too hard, but I'm not convinced it's actually important. - I didn't convert fsck to the for_each_loose_file interface (mostly because I didn't do the inode-sorting trick, and while I don't think it matters, I didn't go to the work to show that it _doesn't_). Here are the patches: [01/25]: foreach_alt_odb: propagate return value from callback [02/25]: isxdigit: cast input to unsigned char [03/25]: object_array: factor out slopbuf-freeing logic [04/25]: object_array: add a "clear" function [05/25]: clean up name allocation in prepare_revision_walk [06/25]: reachable: use traverse_commit_list instead of custom walk [07/25]: reachable: reuse revision.c "add all reflogs" code [08/25]: prune: factor out loose-object directory traversal [09/25]: reachable: mark index blobs as SEEN [10/25]: prune-packed: use for_each_loose_file_in_objdir [11/25]: count-objects: do not use xsize_t when counting object size [12/25]: count-objects: use for_each_loose_file_in_objdir [13/25]: sha1_file: add for_each iterators for loose and packed objects [14/25]: prune: keep objects reachable from recent objects [15/25]: pack-objects: refactor unpack-unreachable expiration check [16/25]: pack-objects: match prune logic for discarding objects [17/25]: write_sha1_file: freshen existing objects [18/25]: make add_object_array_with_context interface more sane [19/25]: traverse_commit_list: support pending blobs/trees with paths [20/25]: rev-list: document --reflog option [21/25]: rev-list: add --index-objects option [22/25]: reachable: use revision machinery's --index-objects code [23/25]: pack-objects: use argv_array [24/25]: repack: pack objects mentioned by the index [25/25]: pack-objects: double-check options before discarding objects -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html