Hi Michael -
On 4/22/19 6:26 PM, michael@xxxxxxxxx wrote:
+ int *matching_lines = fuzzy_find_matching_lines(parent->file.ptr,
+ target->file.ptr,
+ parent->line_starts,
+ target->line_starts,
+ e->s_lno + offset,
+ e->s_lno,
+ parent_len,
+ e->num_lines);
Here was the issue I ran into, and it's due to translations between
e->s_lno and the parent's "address space".
The short version is that "e->s_lno + offset" is not always the
parent_slno, and parent_len is based off of parent_slno.
guess_line_blames() gives you parent_slno and parent_len, as well as
offset. 'offset' is how you convert from the target's space to the
parent's. parent_slno and parent_len describe the whole chunk given to
us from the diff engine. However, there may be multiple blame_entries
covering that chunk.
So e->s_lno is in the target, but it's not necessarily the beginning of
the entire diff chunk. This is related to that page fault you found a
while back.
Passing e->s_lno + offset for where fuzzy() starts looking in the parent
is fine, but then the length in the parent needs to be adjusted. For
instance, I have this at the top of my modified
fuzzy_find_matching_lines() (changed to take the origins and variables
from guess_line_blames()):
// XXX conversions to michael's variable names
int start_a = e->s_lno + offset;
//int length_a = parent_len; // XXX this fails the test
int length_a = (parent_slno + parent_len) - (e->s_lno + offset);
int start_b = e->s_lno;
int length_b = e->num_lines;
Plus we need a check for length_a <= 0. I had to work to make it be
negative, but it's possible. parent_slno = tlno + offset, so we're
looking at:
length_a = tlno + parent_len - e->s_lno;
That just requires a blame entry split such that e->s_lno > tlno, and a
parent chunk that had 0 lines. I found a case that did that. Basically
in one commit you add a bunch of lines. In another, you change one line
in the middle of that bunch. That causes a split of the diff chunk into
more than one, such that e->s_lno > tlno. That original commit only
added lines, so parent_len == 0.
The intuition for the "negative length_a" isn't that the parent_len is
negative, it's that the e->s_lno chunk (when offset) is outside the
window of the parent's change. I have a simple test for this.
Oh, and we have to length_a == 0, due to this:
max_search_distance = length_a - 1;
Anyway, I'll take what I've got and apply your latest and see what I
come up with. =) Plus, I have fixes for all of the other stuff brought
up in the v6 discussion.
Barret