Re: [PATCH] pack-objects --repack-unpacked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre <nico@xxxxxxx> wrote:
> On Sat, 8 Sep 2007, Junio C Hamano wrote:
> > This actually was meant to be used to sort object entries from
> > multiple packs together.  The update to pack-objects you are
> > commenting on deals with one packfile at a time, but I think we
> > probably should collect from all packs and then sort (which was
> > how merge-pack used this function).
> 
> I'm not sure sorting objects from multiple packs together like that is 
> going to help deltification.  It is unlikely that related objects (e.g.. 
> objects having the same path) will be located at the same offset in 
> different packs.

Yes.  But when you are merging several packfiles together and you
don't supply `--no-delta-reuse` then we're really just going to
copy the data from the sources to the output.  There is not a lot
of deltification to be performed; maybe only a handful of loose
objects will need to locate deltas.  So helping deltification is
not really of concern here.

What Junio is trying to do here is at least preserve their order
within the packfile as that should help to preserve their locality
of access.

Only I'm not sure that's the best merging strategy available to us.

What about something like this:

  1) Read all packfile indexes, sort by offset.

  2) Locate first commit object within each packfile.
  3) Get that commit's commit date; if no commit is in the
     packfile at all use the modification date of the packfile.
  4) Sort the packfiles by their chosen date descending (more
     recent items are closer to the front of the list).

  5) Add objects:
     foreach type in commit tree blob tag
       foreach packfile in sorted_packs_from_4
         while current_object->type == $type
           if (current_object->flags & ADDED) == 0
             add current_object
           current_object++

This way data is still organized by the original order that rev-list
gave us when we created the small packfiles, but we also try to place
data from more recent packfiles into the front of the new packfile.
Its a rough approximation of what rev-list would have given us for
object ordering when it performed a traversal.  Its also a whole lot
cheaper than rev-list and lets us continue to include unreachable
objects, which was the point of this patch.

-- 
Shawn.
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux