Re: [PATCH] pack-objects: re-validate data we copy from elsewhere.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Shawn Pearce <spearce@xxxxxxxxxxx> writes:

> Junio C Hamano <junkio@xxxxxxx> wrote:
>...
>> You could introduce repack.active in .git/config that points at
>> the latest active, make git-repack to notice and update when it
>> updates it.
>...
> Better that it update a symref like thing instead.  For example
> create a ".git/objects/pack/active" file holding the name
> ("pack-n{40}.pack") of the current active pack.  If this file is
> missing choose the pack that is smallest as the active pack.

Actually I like the one you did not quote even better the more I
think about it.

>> We could also just use .git/objects/pack/pack-active.{pack,idx}
>> files.  This needs some surgery to get rid of packed_git.sha1[],
>> sha1_pack_name() and friends and have them only require .pack
>> and .idx are linked by their basename only as was discussed in a
>> separate thread to make it dumb-transport friendly.

The argument given as the reason (rather, excuse) the dumb
transport routines wanted to rely on the packed_git.sha1[] and
sha1_pack_name() was because we would need to avoid packname
collisions _anyway_ so relying on the convention to have
"pack-[0-9a-f]{40}.(pack|idx)" is OK or even desirable.

We need to avoid packname collisions, and it is acceptable to
assume that SHA-1 is practically collision free like the rest of
the system does.  However, if the dumb transport wants to avoid
packname collision, it should not rely on the way how the other
side names its packs.  It first downloads the .idx files, so it
can compute the _right_ packname using the sorted object names
recorded there [*1*], and store the downloaded pack/idx under
the right name, without relying on the way how the other side
names their packs (it still needs to rely on the names in that
their name end with .pack and .idx, and .pack and .idx
corresponds with each other by their basenames.  But the point
is it should not depend on more than that, especially that the
basename is of the form "pack-[0-9a-f]{40}" nor the hex part is
the correct packname).

Now if we fix dumb transport downloaders, then we could even
make a convention that the packs named pack-[0-9a-f]{40}.pack
are archive packs.  And git-repack can even have a convention
that .git/objects/pack/pack-active.(pack|idx) is the active
pack.

[Footnote]

*1* a refresher course of the packname generation; it is SHA-1
over the object names (20-byte binary representation) in the
pack, sorted in byte order.  See builtin-pack-objects.c for
details.


-- 
VGER BF report: S 0.99883
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]