Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite

Junio C Hamano <gitster@xxxxxxxxx> · Mon, 27 Jan 2014 12:51:33 -0800

David Kastrup <dak@xxxxxxx> writes:

> Junio C Hamano <gitster@xxxxxxxxx> writes:
>
>> David Kastrup <dak@xxxxxxx> writes:
>>
>>> The previous implementation uses a sorted linear list of struct
>>> blame_entry in a struct scoreboard for organizing all partial or
>>> completed work.  Every task that is done requires going through the
>>> whole list where most entries are not relevant to the task at hand.
>>>
>>> This commit reorganizes the data structures in order to have each
>>> remaining subtask work with its own sorted linear list it can work off
>>> front to back.  Subtasks are organized into "struct origin" chains
>>> hanging off particular commits.  Commits are organized into a priority
>>> queue, processing them in commit date order in order to keep most of
>>> the work affecting a particular blob collated even in the presence of
>>> an extensive merge history.  In that manner, linear searches can be
>>> basically avoided anywhere.  They still are done for identifying one
>>> of multiple analyzed files in a given commit, but the degenerate case
>>> of a single large file being assembled from a multitude of smaller
>>> files in the past is not likely to occur often enough to warrant
>>> special treatment.
>>> ---
>>
>> Sign-off?
>
> Not while this is not fit for merging because of #if 0 etc and missing
> functionality.  This is just for review.

That is not what Signing off a patch means, though ;-)

>> Actually, I'd like to take my previous suggestion to add this as
>> blame2 (to replace blame in the future) back.  That was based on my
>> fear/hope to see an implementation based on a completely different
>> approach, but the basic premise of having one blame_entry record per
>> the lines of the final image in the scoreboard, using diff between
>> parents to the child to find common lines for passing the blame up,
>> etc. have not changed at all and the change is all about organizing
>> the pieces of data in a *much* *better* way to avoid needlessly
>> finding what we already have computed.
>
> Yes.  Basically, the call graph outside of blame.c itself should be
> pretty much the same.
> ...
> Ok.  I'm also certain to have a few "space between function name and
> arguments" left and will grep for those before submitting the next
> version.
>
> Next version will at least include -M option, possibly leaving -C for
> later.

OK.  As we do not want to break the build in the middle of the
series, but we still want to keep the individual steps reviewable
incrementally, I would think the best structure would be have the
series that consists of multiple steps "the basic infrastructure is
there, but no rename following, and neither -M or -C works", "now
renames are followed", "now -M works", etc., with tests that are not
yet working (in the beginning, most of "git blame" test may no
longer work until the series is finished) marked with

    s/test_expect_success/test_expect_failure/

and turn expect_failure into expect_success as the series advances.
That way, we may get a better picture of what each step achieves.  I
dunno.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html