Re: Why so much time in the kernel?

Keith Packard <keithp@xxxxxxxxxx> · Fri, 16 Jun 2006 10:29:28 -0700

On Fri, 2006-06-16 at 13:00 -0400, Jon Smirl wrote:
> Is it a crazy idea to read the cvs files, compute an sha1 on each
> expanded delta and then write the delta straight into a pack file? Are
> the cvs and git delta formats the same? What about CVS's forward and
> reverse delta use?

At this point, merging blobs into packs isn't a significant part of the
computational cost. parsecvs is spending all of its time in the
quadratic traversal of the diff chains; fixing that to emit all of the
versions in a single pass should speed up that part of the conversion
process dramatically.

>  While this is going on, track the
> branches/changsets in memory and then finish up by writing these trees
> into the pack file too. This should take no more ram than cvsps needs
> currently.

cvsps drops too much state on the floor making branch point and branch
contents inaccurate. What I'm hoping is that I can figure out a way to
discard most of the per-version information by computing tree objects in
reverse order, saving only the tree sha1 and other per-commit info, then
stitch the commits together using that, without needing the full
per-file data.

-- 
keith.packard@xxxxxxxxx
Attachment:
signature.asc

Description: This is a digitally signed message part