Linus Torvalds <torvalds@xxxxxxxx> writes: > On Sat, 2 Sep 2006, Junio C Hamano wrote: >> >> repack without -a essentially boils down to: >> >> rev-list --objects --all --unpacked | >> pack-objects $new_pack >> >> which picks up all live loose objects and create a new pack. >> >> If rev-list had an extention that lets you say >> >> rev-list --objects --all --unpacked=$active_pack | >> pack-objects $new_pack > > I have to say, that adding another pack-objects-specific flag to rev-list > doesn't necessarily strike me as very nice. > > It might be much better to just teach "git pack-objects" to walk the > object list by hand. The whole "git-rev-list | git-pack-objects" approach > made more sense back when git-rev-list was the _only_ think listing > objects, but now that we've librarized the whole object listing thing, > maybe it's time to avoid the pipeline and just do it inside the packer > logic. > > Of course, having git-pack-objects be able to take input from stdin is > still useful, but I'd rather start moving the obviously packing-specific > flags out of git-rev-list, and into git-pack-objects instead. > > [ That would also potentially make packing more efficient - right now the > "phase 1" of packing is literally to just figure out the type and the > size of the object, in order to sort the object list. > > So when you do a big repack, first "git-rev-list" ends up doing all this > work, and then "git-pack-object" ends up _redoing_ some of it. > > Especially for tree objects (which are one of the most common kinds), we > already opened the object once when we traversed it, so opening it again > just to look at its size is kind of sad.. ] Well, I tried (see two patches from tonight, and the result is sitting in "pu"), but it turns out that the information pack-objects wants to get in check_object() is somewhat different from what "struct object" family is willing to give back easily during the traversal done by get_revision() interface. Since "struct object" traditionally has not had provision of recording size (created_object() interface just takes sha1 without size, for example), and it was slimmed down to contain absolute minimum information, I am reluctant to add fields to record the size there. Also having in-pack size available would be rather nice but the functions involved in the call chain (including process_tree and process_blob) did not even care where the objects are stored, so it would involve rather nasty surgery. So I stopped short of that. I am not quite sure how useful tonight's patches are in practice. It gets rid of the pipe, but moves the revision walker memory pressure from rev-list to pack-objects itself, so... - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html