Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite

David Kastrup <dak@xxxxxxx> · Mon, 27 Jan 2014 20:45:06 +0100

Junio C Hamano <gitster@xxxxxxxxx> writes:

> David Kastrup <dak@xxxxxxx> writes:
>
>> The previous implementation uses a sorted linear list of struct
>> blame_entry in a struct scoreboard for organizing all partial or
>> completed work.  Every task that is done requires going through the
>> whole list where most entries are not relevant to the task at hand.
>>
>> This commit reorganizes the data structures in order to have each
>> remaining subtask work with its own sorted linear list it can work off
>> front to back.  Subtasks are organized into "struct origin" chains
>> hanging off particular commits.  Commits are organized into a priority
>> queue, processing them in commit date order in order to keep most of
>> the work affecting a particular blob collated even in the presence of
>> an extensive merge history.  In that manner, linear searches can be
>> basically avoided anywhere.  They still are done for identifying one
>> of multiple analyzed files in a given commit, but the degenerate case
>> of a single large file being assembled from a multitude of smaller
>> files in the past is not likely to occur often enough to warrant
>> special treatment.
>> ---
>
> Sign-off?

Not while this is not fit for merging because of #if 0 etc and missing
functionality.  This is just for review.

> Actually, I'd like to take my previous suggestion to add this as
> blame2 (to replace blame in the future) back.  That was based on my
> fear/hope to see an implementation based on a completely different
> approach, but the basic premise of having one blame_entry record per
> the lines of the final image in the scoreboard, using diff between
> parents to the child to find common lines for passing the blame up,
> etc. have not changed at all and the change is all about organizing
> the pieces of data in a *much* *better* way to avoid needlessly
> finding what we already have computed.

Yes.  Basically, the call graph outside of blame.c itself should be
pretty much the same.

> Style; please make /* and */ sit on their own line without anything
> else in multi-line comments.

Will do.

>> +	struct origin *next;
>>  	struct commit *commit;
>> +	/* `suspects' contains blame entries that may be attributed to
>> +	 * this origin's commit or to parent commits.  When a commit
>> +	 * is being processed, all suspects will be moved, either by
>> +	 * assigning them to an origin in a different commit, or by
>> +	 * shipping them to the scoreboard's ent list because they
>> +	 * cannot be attributed to a different commit.
>> +	 */
>> +	struct blame_entry *suspects;
>>  	mmfile_t file;
>>  	unsigned char blob_sha1[20];
>>  	unsigned mode;
>> +	/* shipped gets set when shipping any suspects to the final
>> +	 * blame list instead of other commits
>> +	 */
>> +	char shipped;
>
> Unused?

More like "added, forgotten, added under yet another name":

>> +	char guilty;

I'll have to decide which one to keep.

>> +		/* Should be present exactly once in commit chain */
>> +		for (p = o->commit->util; p; l = p, p = p->next) {
>> +			if (p == o)
>> +			{
>
> Style; please have { sit with the control structure that opens the
> block, unless it is the opening brace of the function body or
> struct/union definition.

Ok.

>> +static struct blame_entry *
>> +blame_merge(struct blame_entry *list1,
>> +	    struct blame_entry *list2)
>
> Style; we do not do GNU-ish "function name begins a line".  Even
> though I personally find it easier to use for things like 'grep',
> that is not the style we use around here.

Ok.  I'm also certain to have a few "space between function name and
arguments" left and will grep for those before submitting the next
version.

Next version will at least include -M option, possibly leaving -C for
later.

-- 
David Kastrup

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html