Re: [PATCH] pack-objects: re-validate data we copy from elsewhere.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds <torvalds@xxxxxxxx> writes:

> On Sat, 2 Sep 2006, Junio C Hamano wrote:
>> 
>> repack without -a essentially boils down to:
>> 
>> 	rev-list --objects --all --unpacked |
>>         pack-objects $new_pack
>> 
>> which picks up all live loose objects and create a new pack.
>> 
>> If rev-list had an extention that lets you say
>> 
>> 	rev-list --objects --all --unpacked=$active_pack |
>> 	pack-objects $new_pack
>
> I have to say, that adding another pack-objects-specific flag to rev-list 
> doesn't necessarily strike me as very nice.
>
> It might be much better to just teach "git pack-objects" to walk the 
> object list by hand. The whole "git-rev-list | git-pack-objects" approach 
> made more sense back when git-rev-list was the _only_ think listing 
> objects, but now that we've librarized the whole object listing thing, 
> maybe it's time to avoid the pipeline and just do it inside the packer 
> logic.
>
> Of course, having git-pack-objects be able to take input from stdin is 
> still useful, but I'd rather start moving the obviously packing-specific 
> flags out of git-rev-list, and into git-pack-objects instead.
>
> [ That would also potentially make packing more efficient - right now the 
>   "phase 1" of packing is literally to just figure out the type and the 
>   size of the object, in order to sort the object list.
>
>   So when you do a big repack, first "git-rev-list" ends up doing all this 
>   work, and then "git-pack-object" ends up _redoing_ some of it. 
>
>   Especially for tree objects (which are one of the most common kinds), we 
>   already opened the object once when we traversed it, so opening it again 
>   just to look at its size is kind of sad.. ]

Well, I tried (see two patches from tonight, and the result is
sitting in "pu"), but it turns out that the information
pack-objects wants to get in check_object() is somewhat
different from what "struct object" family is willing to give
back easily during the traversal done by get_revision()
interface.

Since "struct object" traditionally has not had provision of
recording size (created_object() interface just takes sha1
without size, for example), and it was slimmed down to contain
absolute minimum information, I am reluctant to add fields to
record the size there.  Also having in-pack size available would
be rather nice but the functions involved in the call chain
(including process_tree and process_blob) did not even care
where the objects are stored, so it would involve rather nasty
surgery.  So I stopped short of that.  I am not quite sure how
useful tonight's patches are in practice.  It gets rid of the
pipe, but moves the revision walker memory pressure from rev-list
to pack-objects itself, so...





-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]