From: Michael Platings <michael@xxxxxxxxx> Hi Git devs, Some of you may be familiar with the git-hyper-blame tool [1]. It's "useful if you have a commit that makes sweeping changes that are unlikely to be what you are looking for in a blame, such as mass reformatting or renaming." git-hyper-blame is useful but (a) it's not convenient to install; (b) it's missing functionality available in regular git blame; (c) it's method of matching lines between chunks is too simplistic for many use cases; and (d) it's not Git so it doesn't integrate well with tools that expect Git e.g. vim plugins. Therefore I'm hoping to add similar and hopefully superior functionality to Git itself. I have a very rough patch so I'd like to get your thoughts on the general approach, particularly in terms of its user-visible behaviour. My initial idea was to lift the design directly from git-hyper-blame. However the approach of picking single revisions to somehow ignore doesn't sit well with the -w, -M & -C options, which have a similar intent but apply to all revisions. I'd like to get your thoughts on whether we could allow applying the -M or -w options to specific revisions. For example, imagine it was agreed that all the #includes in a project should be reordered. In that case, it would be useful to be able to specify that the -M option should be used for blames on that revision specifically, so that in future when someone wants to know why a #include was added they don't have to run git blame twice to find out. Options that are specific to a particular revision could be stored in a ".gitrevisions" file or similar. If the principle of allowing blame options to be applied per-revision is agreeable then I'd like to add a -F/--fuzzy option, to sit alongside -w, -M & -C. I've implemented a prototype "fuzzy" option, patch attached. The option operates at the level of diff chunks. For each line in the "after" half of the chunk it uses a heuristic to choose which line in the "before" half of the chunk matches best. The heuristic I'm using at the moment is of matching "bigrams" as described in [2]. The initial pass typically gives reasonable results, but can jumble up the lines. As in the reformatting/renaming use case the content should stay in the same order, it's worth going to extra effort to avoid jumbling lines. Therefore, after the initial pass, the line that can be matched with the most confidence is used to partition the chunk into halves before and after it. The process is then repeated recursively on the halves above and below the partition line. I feel like a similar algorithm has probably already been invented in a better form - if anyone knows of such a thing then please let me know! I look forward to hearing your thoughts. Thanks, -Michael [1] https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html [2] https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient Michael Platings (1): Add git blame --fuzzy option. blame.c | 352 +++++++++++++++++++++++++++++++++++++++++++++++-- blame.h | 1 + builtin/blame.c | 3 + t/t8020-blame-fuzzy.sh | 264 +++++++++++++++++++++++++++++++++++++ 4 files changed, 609 insertions(+), 11 deletions(-) create mode 100755 t/t8020-blame-fuzzy.sh -- 2.14.3 (Apple Git-98)