Re: Tackling Git Limitations with Singular Large Line-seperated Plaintext files

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 27 Jun 2014 10:48:49 -0700

Shawn Pearce <spearce@xxxxxxxxxxx> writes:

> Git does source code well. I don't know enough to judge if DNA/RNA
> sequence storage is similar enough to source code to benefit from
> things like `git log -p` showing deltas over time, or if some other
> algorithm would be more effective.
>
>> From my understanding the largest problem revolves around git's delta
>> discovery method, holding 2 files in memory at once - is there a
>> reason this could not be adapted to page/chunk the data in a sliding
>> window fashion ?
>
> During delta discovery Git holds like 11 files in memory at once....

Even though the original question mentioned "delta discovery", I
think what was being asked is not "delta" in the Git sense (which
your answer is about) but is "can we diff two long sequences of text
(that happens to consist of only 4-letter alphabet but that is a
irrelevant detail) without holding both in-core in their entirety?",
which is a more relevant question/desire from the application point
of view.

"Is there a reason this could not be adapted?"  No, there is no
particular reason why this "could not".  I think that the only
reason we only do in-core diff is because "adapting to page/chunk"
hasn't been anybody's high priority list of itches to scratch.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html