Re: fetching packs and storing them as packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Eran Tromer <git2eran@xxxxxxxxxx> wrote:
> On 2006-10-27 05:00, Shawn Pearce wrote:
> >> Change git-repack to follow references under $GIT_DIR/tmp/refs/ too.
> >> To receive or fetch a pack:
> >> 1. Add references to the new heads in
> >>    `mktemp $GIT_DIR/tmp/refs/XXXXXX`.
> >> 2. Put the new .pack under $GIT_DIR/objects/pack/.
> >> 3. Put the new .idx under $GIT_DIR/objects/pack/.
> >> 4. Update the relevant heads under $GIT_DIR/refs/.
> >> 5. Delete the references from step 1.
> 
> > That was actually my (and also Sean's) solution.  Except I would
> > put the temporary refs as "$GIT_DIR/refs/ref_XXXXXX" as this is
> > less code to change and its consistent with how temporary loose
> > objects are created.
> 
> If you do that, other programs (e.g., anyone who uses rev-list --all)
> may try to walk those heads or consider them available before the pack
> is really there. The point about $GIT_DIR/tmp/refs is that only programs
> meddling with physical packs (git-fetch, git-receive-pack, git-repack)
> will know about it.
 
Doh.  Yes, of course, that makes much sense.

Hmm... Looking at git-repack we have two things currently pending
to rework in there:

  - Historical vs. active packs.
  - Don't delete a possibly still incoming pack during -d.

These have a lot of the same implementation issues.  We need to
be able to identify a set of packs which should be allowed for
repack with -a, and allowed for removal with -d if -a was also used.
A newly uploaded pack cannot be in that list unless its contents are
referenced by one or more refs (which implies that the receive-pack
process has completed).

I'm thinking that the ref thing might be unnecessary.  We just
need to fix repack so it builds a list of "active packs" whose
objects should be copied into the new pack, and then only packs
loose objects and those objects contained by an active packs.

So the receive-pack process becomes:

  a. Create temporary pack file in $GIT_DIR/objects/pack_XXXXX.
  b. Create temporary index file in $GIT_DIR/objects/index_XXXXX.
  c. Write pack and index.
  d. Move pack to $GIT_DIR/objects/pack/...
  e. Move index to $GIT_DIR/objects/pack...
  f. Update refs.
  g. Arrange for new pack and index to be considered active.

And the repack -a -d process becomes:

  1. List all active packs and store in memory.
  2. Repack only loose objects and objects contained in active packs.
  3. Move new pack and idx into $GIT_DIR/objects/pack/...
  4. Arrange for new pack and idx to be considered active.
  5. Delete active packs found by step #1.

Junio was originally considering making historical packs
historical by placing their names into an information file (such as
`$GIT_DIR/objects/info/historical-packs`) and then consider all other
packs as active.  Thus step #1 is list all packs and removes those
whose names appear in historical-packs, while step #4 is unnecessary.

I was thinking about just changing the "pack-" prefix to "hist-" for
the historical packs and assuming all "pack-*.pack" to be active.
Thus step #1 is a simple glob on the pack directory and step #4
is unnecessary.

In the latter case its easy to mark an existing pack as historical
(just hardlink hist- names for pack, then idx, then unlink previous
names) and its also easy to mark new incoming packs as non active
by using a different prefix (e.g. "incm-") during step #d/#e and
then relinking them as "pack-" during step #g.  Its also very safe
on systems that support hardlinks.

We shouldn't ever need to worry about race conditions with repacking
historical packs.  For starters historical packs will tend to be
several years' worth of object accumulation and will be so large
that repacking them might take 45 minutes or more.  Thus they
probably will never get repacked.  An active pack will simply move
into historical status after it gets so large that its no longer
worthwhile to keep repacking it.  They also will tend to have objects
that are so old that at least one ref in the repository will point
at their entire DAG and thus everything would carry over on a repack.

So this would be cleaner then messing around with temporary refs and
gets us the historical pack feature we've been looking to implement.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]