Re: [PATCH] pack-objects: re-validate data we copy from elsewhere.

Shawn Pearce <spearce@xxxxxxxxxxx> · Mon, 4 Sep 2006 02:44:43 -0400

Junio C Hamano <junkio@xxxxxxx> wrote:
> Now if we fix dumb transport downloaders, then we could even
> make a convention that the packs named pack-[0-9a-f]{40}.pack
> are archive packs.  And git-repack can even have a convention
> that .git/objects/pack/pack-active.(pack|idx) is the active
> pack.

Seems reasonable.

I take it you are proposing that a dumb transport always downloads
pack-active.pack as pack-n{40}.pack where the dumb protocol
downloader computed the correct pack name from its contents.  Thus
any remote pack downloaded over a dumb transport is automatically
treated as a historical pack by the receiving repository.

This will cause someone tracking a remote repository over a dumb
transport to need to repack only a subset of their historical packs
frequently into their own active.pack while leaving other historical
packs untouched.

But the more that I think about this neither solution (an active
pack symref or pack-active.pack) really solves this.  Being limited
to just one active pack seems to be a problem with at least the
dumb transports.

I think that's why I preferred the size threshold idea.  The active
packs are cheap to repack because they are small.  The larger
packs aren't cheap to repack because they are large - and probably
historical.  What we are trying to get is fast repacks for the
active objects while still getting full validation anytime we do a
repack and (possibly) destroy the source.  A size threshold does it.

When Jon Smirl and I started kicking around the idea of a historical
pack for Mozilla I was thinking of just storing a list of pack base
names in ".git/objects/pack/historical".  Packs listed there should
generally be exempt from repacking.  During an initial clone we'd
need to deliver the contents of that file to the new repository,
as if the source considers a pack historical its likely the new
repository would want to as well.

But now as I write this email I'm thinking that it may be just as
easy to change the base name of the pack to "hist-n{40}" when we
want to consider it historical.

[snipped and re-ordered]
> It first downloads the .idx files, so it can compute the
> _right_ packname using the sorted object names recorded there

Why trust the .idx?  I've seen you post that the .idx is purely
a local matter.  The "smart" Git protocol only receives the .pack
from the remote and computes the .idx locally or unpacks it to loose
objects locally; why should a dumb transport trust the remote .idx?

Oh, I know, when the .idx is >50 MiB, the .pack is >450 MiB, has
2 million objects and delta chains ~5000 long.

Are we thinking that .idx files may need to have a slightly wider
distribution than "local"?

-- 
Shawn.

-- 
VGER BF report: S 1
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html