Bo Yang <struggleyb.nku@xxxxxxxxx> writes: > Jonathan Nieder <jrnieder@xxxxxxxxx> writes: > > Hmm, I can imagine some (mutually inconsistent) heuristics: > > > > �- Suppose in the blamed commit a single isolated line changed. �Then > > � it is clear where to look next. > > > > �- If the mystery code is at the beginning of the file (resp. > > � beginning of a diff -C0 hunk), maybe it was based on the line at the > > � same position within the previous commit. > > > > �- Take the line with the lowest Levenshtein distance from the mystery > > � code. > > > > �- Expect certain common patterns of change: substituted words, > > � whitespace changes, added arguments for a function, things like that. > > > > That said, I still dont have a clear picture of a basic strategy. > > I can't understand fully about your above strategy. I think we can > category the code change into two cases: > > 1. The diff looks like this: > > @@ -1008,29 +1000,29 @@ int cmd_format_patch(int argc, const char > **argv, const char *prefix) > add_signoff = xmemdupz(committer, endpos - committer + 1); > } > > - for (i = 0; i < extra_hdr_nr; i++) { > - strbuf_addstr(&buf, extra_hdr[i]); > + for (i = 0; i < extra_hdr.nr; i++) { > + strbuf_addstr(&buf, extra_hdr.items[i].string); > strbuf_addch(&buf, '\n'); > } Errr... how the first line in preimage differs from first line in postimage? The look as if they are the same: - for (i = 0; i < extra_hdr_nr; i++) { + for (i = 0; i < extra_hdr.nr; i++) { > > i.e. there is both deletion and addition in a change. And this means we > modify some lines of the code. So, what we do will be tracing the two > 'minus' lines and then find another diff. Start trace from that diff > recursively. > > Yes, the new added code may also be moved or copied from other place. > But, I think here, we should focus on the lines before this changeset. The problem is when you are asking about tracking a subset of lines that appear in postimage of a patch. For example if we ask for history of strbuf_addstr(&buf, extra_hdr.items[i].string); line, should we track history of for (i = 0; i < extra_hdr.nr; i++) { line which appears in relevant diff chunk? If not, how we should detect which line in preimage (if any) corresponds to given line in postimage? > 2. The diff looks like: > > @@ -879,9 +885,12 @@ int cmd_grep(int argc, const char **argv, const > char *prefix) > opt.regflags = REG_NEWLINE; > opt.max_depth = -1; > > + strcpy(opt.color_context, ""); > strcpy(opt.color_filename, ""); > + strcpy(opt.color_function, ""); > strcpy(opt.color_lineno, ""); > strcpy(opt.color_match, GIT_COLOR_BOLD_RED); > > This means, the code here is added from scratch. Here, I think we have > three options. > 1. Find if the new code is moved here from other place. > 2. Find if the new code is copied from other place. > 3. We find the end of the history, so stop here. > > The problems remain how do we find the copied/moved code. The new > added code may be copied/moved from multiple place with little > changes. I guess that you could take a look at how git-blame does handle this... but I think you would get something like generalization of ordinary patch, where preimage of chunk can come from different place / different file. P.S. I like it that you provide real-life examples. They really help with understanding what are you talking about. -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html