Re: fast-import deltas

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 01 Apr 2014 10:14:02 -0700

Mike Hommey <mh@xxxxxxxxxxxx> writes:

> On Tue, Apr 01, 2014 at 09:15:12AM -0400, Jeff King wrote:
>> > It seems to me fast-import keeps a kind of human readable format for its
>> > protocol, i wonder if xdelta format would fit the bill. That being said,
>> > I also wonder if i shouldn't just try to write a pack on my own...
>> 
>> The fast-import commands are human readable, but the blob contents are
>> included inline. I don't see how sending a binary delta is any worse
>> than sending a literal binary blob over the stream.
>
> OTOH, the xdelta format is not exactly straightforward to produce, with
> the variable length encoding of integers. Not exactly hard, but when
> everything else in fast-import is straightforward, one has to wonder.

Unless you already have your change in the xdelta on hand, or the
format your foreign change is in gives sufficient information to
produce a corresponding xdelta without looking at the content that
your foreign change applies to, it is silly to try to convert your
foreign change into xdelta and feed it to fast-import.

What constitutes "sufficient" information?  The xdelta format is a
series of instructions that lets you:

 - copy N bytes from offset in the source material to the
   destination; or
 - copy these N literal bytes to the destination.

to an existing piece of content, identified by the object name of
the "source material", to produce a result of "applying delta".

As an example, think about the case where you have *,v files used by
RCS (and CVS).  The "foreign changes" given to you by that format is
a series of instructions that roughly corresponds to an "ed" script.
Insert these lines at the line number L, delete N lines from line
number K, etc.  In order to convert such a change into xdelta, you
would need to know what these line numbers correspond to byte offset
in the original file.  You also may want to know what the Git object
name for the original is, although in the fast-import stream you
might be able to get away by using the object mark facility.

Assuming that you do have and are willing to read the original file,
you have three possible (and one impractical) approaches:

 - Apply the foreign changes to the original file yourself (as that
   is the foreign system you are interested in, you know how to do
   that much better than Git does), and produce xdelta between the
   original and the result using only the original and the result.

 - Apply the foreign changes to the original file yourself, and feed
   the resulting content to fast-import in full, letting fast-import
   convert into the format Git understands.

 - Interpret the foreign changes, using the original file as a
   reference, to convert it into xdelta.

 - Teach fast-import how to interpret various formats that are used
   to express foreign changes, and feed that.

In the first approach, this "given the original and the result,
produce xdelta between them" can be reused by other people's
system.  You may be able to borrow diff-delta.c from us under our
licensing terms.

The second is the most straightforward; eventual deltification will
happen when the resulting repository is repacked and uses the same
code from diff-delta.c.

The third would be "*,v expresses the source location and length in
terms of lines, so look at the original to convert these into byte
offset and byte length xdelta wants", which I would think is silly.

And the last one is a maintenance nightmare I do not think we would
want to touch with a ten-foot pole.

In short, the most practical solution would be to reconstitute a
full object and feed that to fast-import, unless you already have
xdelta or you can turn your foreign change into xdelta without ever
looking at the original.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html