(+cc: Duy and Baz, who commented before) Yann Dirson wrote: > This feature tries to group together files moving from and to > identical directories - the most common case being directory renames. > > It is activated by the new --detect-dir-renames diffcore > flag. > > This is only the first step, adding the basic functionnality and > adding support to raw diff output (and it breaks unified-diff output > which does not know how to handle that stuff yet). [...] > For now both the result > of "mv a b" and "mv a/* b/" are displayed as "Rnnn a/ b/", which is > probably not what we want. "Rnnn a/* b/" could be a good choice for > the latter if we want them to be distinguished, and even if we want > them to look the same. Example? > Other future developements to be made on top of this include: > * extension of unified-diff format to express this [...] > --- Oh, so this is for diff --raw only. For the confused: the discussion from v3 perhaps explains why. But: 1. Just like --word-diff, this could be used as a user-facing feature for diffs that cannot be applied 2. Even if no one agrees on the proper diff format, it would be good plumbing to have. v3: http://thread.gmane.org/gmane.comp.version-control.git/99780/focus=99782 v2: http://thread.gmane.org/gmane.comp.version-control.git/99529/focus=99528 v1: http://thread.gmane.org/gmane.comp.version-control.git/94612/focus=96807 I kind of like the cover letter to v1: The idea is to add a new pass to diffcore-rename, to group file renames looking like a full directory move, and then to hide those file renames which do not carry any additionnal information. Here is a sample run: $ ./git-diff-tree --abbrev=6 ee491 --factorize-renames -r [DBG] possible rename from arm/ to bar/ [DBG] possible rename from ppc/ to moved/ [DBG] discarding dir rename of arm/, mixing moved/ and bar/ [DBG] ppc/* -> moved/* makes ppc/sha1ppc.S -> moved/sha1ppc.S uninteresting [DBG] ppc/* -> moved/* makes ppc/sha1.c -> moved/sha1.c uninteresting ee491a42190ec6e716f46a55fa0a7f4e307f1629 :040000 040000 000000... 000000... R100 ppc/ moved/ :100644 100644 9e3ae0... 9e3ae0... R100 arm/sha1.c bar/sha1.c :100644 100644 395264... 395264... R100 arm/sha1.h bar/sha1.h :100644 100644 c3c51a... c065ee... R099 ppc/sha1.h moved/sha1.h :100644 100644 8c1cb9... 8c1cb9... R100 arm/sha1_arm.S moved/sha1_arm.S Presumably this patch takes care of the first step (grouping potential full-directory moves) and not the second (hiding the redundant file renames). NaÃve review: > --- a/diff-lib.c > +++ b/diff-lib.c > @@ -208,7 +208,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option) > ce_option, &dirty_submodule); > if (!changed && !dirty_submodule) { > ce_mark_uptodate(ce); > - if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER)) > + if (!DIFF_OPT_TST(&revs->diffopt, FIND_COPIES_HARDER) && > + !DIFF_OPT_TST(&revs->diffopt, DETECT_DIR_RENAMES)) > continue; Hm, why? > @@ -338,7 +339,8 @@ static int show_modified(struct rev_info *revs, [...] Likewise. [...] > --- a/diffcore-rename.c > +++ b/diffcore-rename.c > @@ -11,6 +11,7 @@ > static struct diff_rename_dst { > struct diff_filespec *two; > struct diff_filepair *pair; > + int i_am_not_single:1; // does not look for a match, only here to be looked at > } *rename_dst; What does single mean? > @@ -49,9 +50,36 @@ static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two, [...] > > +static struct diff_rename_dst *locate_rename_dst_dir(struct diff_filespec *dir) > +{ > + /* code mostly duplicated from locate_rename_dst - not sure we > + * could merge them efficiently,though > + */ > + int first, last; > + int prefixlength = strlen(dir->path); > + > + first = 0; > + last = rename_dst_nr; > + while (last > first) { > + int next = (last + first) >> 1; > + struct diff_rename_dst *dst = &(rename_dst[next]); > + int cmp = strncmp(dir->path, dst->two->path, prefixlength); > + if (!cmp) > + return dst; > + if (cmp < 0) { > + last = next; > + continue; > + } > + first = next+1; > + } > + /* not found */ > + return NULL; > +} Binary search --- this is just a way to index into the sorted list of rename_dsts, right? At first I thought it was searching for a good rename destination. A comment (or overview in the log message) could help clarify. > @@ -386,8 +414,11 @@ static int find_exact_renames(void) > for (i = 0; i < rename_src_nr; i++) > insert_file_table(&file_table, -1, i, rename_src[i].one); > > - for (i = 0; i < rename_dst_nr; i++) > + for (i = 0; i < rename_dst_nr; i++) { > + if (rename_dst[i].i_am_not_single) > + continue; > insert_file_table(&file_table, 1, i, rename_dst[i].two); > + } What is this code path for? (Sorry for the tedious questions. My thinking is, if I cannot answer them without doing some legwork, how could the person about to break your code who is only glancing for a moment before plunging forward on an exciting new feature?) > @@ -414,6 +445,180 @@ static void record_if_better(struct diff_score m[], struct diff_score *o) > m[worst] = *o; > } > > +struct diff_dir_rename { > + struct diff_filespec *one; > + struct diff_filespec *two; > + int discarded; > + struct diff_dir_rename* next; > +}; Okay. > +// FIXME: prevent possible overflow > +/* > + * Copy dirname of src into dst, suitable to append a filename without > + * an additional "/". > + * Only handles relative paths since there is no absolute path in a git repo. > + * Writes "" when there is no "/" in src. > + * May overwrite more chars than really needed, if src ends with a "/". > + */ > +static const char* copy_dirname(char* dst, const char* src) > +{ > + char* lastslash = strrchr(src, '/'); > + if (!lastslash) { > + *dst = '\0'; > + return dst; > + } > + strncpy(dst, src, lastslash - src + 1); memcpy? > + dst[lastslash - src + 1] = '\0'; > + > + // if src ends with a "/" strip the last component > + if (lastslash[1] == '\0') { > + lastslash = strrchr(dst, '/'); > + if (!lastslash) > + return strcpy(dst, "."); > + lastslash[1] = '\0'; > + } It might be easier to read like this: /* Write name of the parent directory of src to dest. */ static char *remove_last_component(char *dst, const char *src) { size_t len = strlen(src); const char *slash; if (len > 0 && src[len - 1] == '/') /* Trailing slash. Ignore it. */ len--; slash = memrchr(src, '/', len); if (!slash) { *dst = '\0'; return dst; } *mempcpy(dst, src, slash - src) = '\0'; return dst; } Requires the glibc-specific function memrchr(), but that is easy to implement. Compare strchrnul() [in git-compat-util.h]. [...] > +static struct diff_dir_rename* factorization_candidates = NULL; > +static void diffcore_factorize_renames(void) Maybe this could be refactored into multiple functions? > @@ -451,13 +656,22 @@ void diffcore_rename(struct diff_options *options) > p->one->rename_used++; > register_rename_src(p->one, p->score); > } > + else { [...] > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) { > + /* similarly, rename factorization needs to > + * see all files from second tree, but we don't > + * want them to be matched against single sources. > + */ > + locate_rename_dst(p->two, 1)->i_am_not_single = 1; Huh? > @@ -509,6 +723,8 @@ void diffcore_rename(struct diff_options *options) > > if (rename_dst[i].pair) > continue; /* dealt with exact match already. */ > + if (rename_dst[i].i_am_not_single) > + continue; /* not looking for a match. */ Oh, not single means "not seeking a new relationship". > @@ -569,7 +785,28 @@ void diffcore_rename(struct diff_options *options) > /* At this point, we have found some renames and copies and they > * are recorded in rename_dst. The original list is still in *q. > */ > + > + /* Now possibly factorize those renames and copies. */ > + if (DIFF_OPT_TST(options, DETECT_DIR_RENAMES)) > + diffcore_factorize_renames(); Huh? Factorize? > DIFF_QUEUE_CLEAR(&outq); > + > + // Now turn non-discarded factorization_candidates into real renames > + struct diff_dir_rename* candidate; > + for (candidate=factorization_candidates; candidate; candidate = candidate->next) { > + struct diff_filepair* pair; > + if (candidate->discarded) continue; > + // visualize toplevel dir if needed - FIXME: wrong place for this ? > + if (!*candidate->one->path) > + candidate->one->path = "./"; > + if (!*candidate->two->path) > + candidate->two->path = "./"; > + pair = diff_queue(&outq, candidate->one, candidate->two); > + pair->score = MAX_SCORE; > + pair->renamed_pair = 1; > + } Looks like I'll need to read diffcore_factorize_renames() after all. :( Sorry, not much to say yet. Hopefully some of that can be useful nonetheless. Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html