Re: Creating objects manually and repack

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> On 8/5/06, Martin Langhoff <martin.langhoff@xxxxxxxxx> wrote:
> >On 8/5/06, Jon Smirl <jonsmirl@xxxxxxxxx> wrote:
> >> On 8/4/06, Linus Torvalds <torvalds@xxxxxxxx> wrote:
> >> > and you're basically all done. The above would turn each *,v file into 
> >a
> >> > *-<sha>.pack/*-<sha>.idx file pair, so you'd have exactly as many
> >> > pack-files as you have *,v files.
> >>
> >> I'll end up with 110,000 pack files.
> >
> >Then just do it every 100 files, and you'll only have 1,100 pack
> >files, and it'll be fine.
> 
> This is something that has to be tuned. If you wait too long
> everything spills out of RAM and you go totally IO bound for days. If
> you do it too often you end up with too many packs and it takes a day
> to repack them.
> 
> If I had a way to pipe the all of the objects into repack one at a
> time without repack doing multiple passes none of this tuning would be
> necessary. In this model the standalone objects never get created in
> the first place. The fastest IO is IO that has been eliminated.

I'm almost done with what I'm calling `git-fast-import`.  It takes
a stream of blobs on STDIN and writes the pack to a file, printing
SHA1s in hex format to STDOUT.  The basic format for STDIN is a 4
byte length (native format) followed by that many bytes of blob data.
It prints the SHA1 for that blob to STDOUT, then waits for another
length.

It naively deltas each object against the prior object, thus it
would be best to feed it one ,v file at a time working from the most
recent revision back to the oldest revision.  This works well for
an RCS file as that's the natural order to process the file in.  :-)

When done you close STDIN and it'll rip through and update the pack
object count and the trailing checksum.  This should let you pack
the entire repository in delta format using only two passes over the
data: one to write out the pack file and one to compute its checksum.


I'll post the code in a couple of hours.

-- 
Shawn.
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]