On 05/07/2013 09:12 AM, Junio C Hamano wrote: > Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: > >>>>>> CVS stores all of the revisions of a single file in a single filename,v >>>>>> file in rcsfile(5) format. The revisions are stored as deltas ordered >>>>>> so that a single revision can be reconstructed from a single serial read >>>>>> of the file. >>>>>> >>>>>> cvs2git reads each of these files once, reconstructing *all* of the >>>>>> revisions for a file in a single go. It then pours them into a >>>>>> git-fast-import stream as blobs and sets a mark on each blob. > > This is more or less off-topic but in the bigger picture it is more > interesting and important X-<. > > The way you describe how cvs2git handles the blobs is the more > efficient way, given that fast-import does not even attempt to > bother to create good deltas. The only thing it does is to see if > the current data deltifies against the last object. > > IIRC, CVS's backend storage is mostly recorded in backward delta, so > if you are feeding the blob data from new to old, then the resulting > pack would follow Linus's law (the file generally grows over time) > and would generally give you a good deltified chain of objects. Yes, you are correct about how CVS orders commits on the mainline. Branches are stored in the opposite order--oldest to newest--but CVS users don't tend to get carried away with branches anyway, and if the changes are small deltafication should help a lot anyway. Cool. I knew that fast-import didn't do much in the way of compression, but I didn't realize that it can compute deltas only between adjacent blobs. So cvs2git happily hits the sweet-spot of fast-import. Michael -- Michael Haggerty mhagger@xxxxxxxxxxxx http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html