Re: fetching packs and storing them as packs

Shawn Pearce <spearce@xxxxxxxxxxx> · Sat, 28 Oct 2006 03:21:46 -0400

Junio C Hamano <junkio@xxxxxxx> wrote:
> Linus Torvalds <torvalds@xxxxxxxx> writes:
> > I would MUCH rather we just rename the index/pack file to something that 
> > git can _use_, but that "git repack -a -d" won't remove....
> 
> Two points.
> 
> The "locking" I mentioned was between receive-pack and repack -a
> -d; upload-pack (what millions people are using to read from the
> repository you are pushing into) is not affected.  So in that
> sense, we can afford to use lock without much contention.

And giving how difficult locking is to get on most filesystems I'd
just rather avoid any sort of locking whenever possible.  That's one
reason why reflog is 1 file per ref and not 1 file per repository...

> I just thought of a cute hack that does not involve renaming
> packs at all (so no need to match new-pack-X.pack with
> pack-X.idx), and Shawn's sequence actually would work, which is:

I take this above statement to mean that you answered your own
question about how my sequence is able to resolve the race condition?

> The receive-pack side:
> 
>   a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
>   b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
>   c. Write pack and index, in "inactive" state.
>   d. Move pack to $GIT_DIR/objects/pack/...
>   e. Move idx to $GIT_DIR/objects/pack...
>   f. Update refs.
>   g. Mark new pack and idx as "active".
> 
> The "repack -a -d" side:
> 
>   1. List all active packs and store in memory.
>   2. Repack only loose objects and objects contained in active packs.
>   3. Move new pack and idx into $GIT_DIR/objects/pack/...
>   4. Mark new pack and idx as "active".
>   5. Delete active packs found by step #1.
> 
> Pack-idx pair is marked "active" by "chmod u+s" the .pack file.
> During the normal operation, all .pack/.idx pair in objects/pack/
> directories are usable regardless of the setuid bit; we would
> never make .pack files executable so u+s would not otherwise
> hurt us either.  "active" probably is better read as "eligible
> for repacking".

As cool as that trick is I'm against using the file mode as a
way to indicate the status of a pack file.  For one thing not
every filesystem that Git is used on handles file modes properly.
We already have core.filemode thanks to some of those and I use
Git on at least one of those "not so friendly" filesystems...

Why not just use create a new flag file?

Lets say that a pack X is NOT eligible to be repacked if
"$GIT_DIR/objects/pack/pack-X.keep" exists.

Thus we want to have the new ".keep" file for historical packs and
incoming receive-pack between steps c and g.  In the former case
the historical pack is already "very large" and thus one additional
empty file to indicate we want to retain that pack as-is is trivial
overhead (relatively speaking); in the latter case the lifespan of
the file is relatively short and thus any overhead associated with it
on the local filesystem is free (it may never even hit the platter).

In the sequence above we create pack-X.keep between steps b and c
during receive-pack ensuring that even before the pack is usable by
a Git reader process that it can't be swept up by a `repack -a -d`
and we delete the pack-X.keep file in step g to mark it active.

Further only repack and the receive-pack side code changes: all
existing packs are automatically taken to be active while only
packs coming in from receive-pack or those marked by a human as
"historical" will be kept.

Two birds, one stone.  Thoughts?

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html