Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite

David Kastrup <dak@xxxxxxx> · Mon, 27 Jan 2014 22:21:16 +0100

Junio C Hamano <gitster@xxxxxxxxx> writes:

> David Kastrup <dak@xxxxxxx> writes:
>
>> Junio C Hamano <gitster@xxxxxxxxx> writes:
>>
>>> David Kastrup <dak@xxxxxxx> writes:
>>>
>>>> The previous implementation uses a sorted linear list of struct
>>>> blame_entry in a struct scoreboard for organizing all partial or
>>>> completed work.  Every task that is done requires going through the
>>>> whole list where most entries are not relevant to the task at hand.
>>>>
>>>> This commit reorganizes the data structures in order to have each
>>>> remaining subtask work with its own sorted linear list it can work off
>>>> front to back.  Subtasks are organized into "struct origin" chains
>>>> hanging off particular commits.  Commits are organized into a priority
>>>> queue, processing them in commit date order in order to keep most of
>>>> the work affecting a particular blob collated even in the presence of
>>>> an extensive merge history.  In that manner, linear searches can be
>>>> basically avoided anywhere.  They still are done for identifying one
>>>> of multiple analyzed files in a given commit, but the degenerate case
>>>> of a single large file being assembled from a multitude of smaller
>>>> files in the past is not likely to occur often enough to warrant
>>>> special treatment.
>>>> ---
>>>
>>> Sign-off?
>>
>> Not while this is not fit for merging because of #if 0 etc and missing
>> functionality.  This is just for review.
>
> That is not what Signing off a patch means, though ;-)

>From Documentation/SubmittingPatches:

    The sign-off is a simple line at the end of the explanation for
    the patch, which certifies that you wrote it or otherwise have
    the right to pass it on as a open-source patch.  The rules are
    pretty simple: if you can certify the below:

            Developer's Certificate of Origin 1.1

            By making a contribution to this project, I certify that:

Well, and the patch is not in a state where I want to contribute it to
this project.  Basically I first want to bring it into a state where
I am not embarrassed by having my name attached to it.

Yes, that's probably not quite the idea of signing off...

> OK.  As we do not want to break the build in the middle of the
> series, but we still want to keep the individual steps reviewable
> incrementally, I would think the best structure would be have the
> series that consists of multiple steps "the basic infrastructure is
> there, but no rename following, and neither -M or -C works", "now
> renames are followed", "now -M works", etc., with tests that are not
> yet working (in the beginning, most of "git blame" test may no
> longer work until the series is finished) marked with
>
>     s/test_expect_success/test_expect_failure/
>
> and turn expect_failure into expect_success as the series advances.
> That way, we may get a better picture of what each step achieves.  I
> dunno.

Seems a bit pointless as the various functionalities are implemented in
separate code passages.  Basically the only "common" code is

static void pass_blame(struct scoreboard *sb, struct origin *origin, int opt)
{
	struct rev_info *revs = sb->revs;
	int i, pass, num_sg;
	struct commit *commit = origin->commit;
[...]

#if 0
	/*
	 * Optionally find moves in parents' files.
	 */
	if (opt & PICKAXE_BLAME_MOVE)
		for (i = 0, sg = first_scapegoat(revs, commit);
		     i < num_sg && sg;
		     sg = sg->next, i++) {
			struct origin *porigin = sg_origin[i];
			if (!porigin)
				continue;
			if (find_move_in_parent(sb, origin, porigin))
				goto finish;
		}

	/*
	 * Optionally find copies from parents' files.
	 */
	if (opt & PICKAXE_BLAME_COPY)
		for (i = 0, sg = first_scapegoat(revs, commit);
		     i < num_sg && sg;
		     sg = sg->next, i++) {
			struct origin *porigin = sg_origin[i];
			if (find_copy_in_parent(sb, origin, sg->item,
						porigin, opt))
				goto finish;
		}
#endif

 finish:
	for (i = 0; i < num_sg; i++) {
		if (sg_origin[i]) {
			drop_origin_blob(sg_origin[i]);
			origin_decref(sg_origin[i]);
		}
	}
	drop_origin_blob(origin);
	if (sg_buf != sg_origin)
		free(sg_origin);
}

So what will happen here is that the #if 0 will get removed again, and
there will be somewhat different code for

        /*
	 * Optionally find moves in parents' files.
	 */

and then somewhat different code for

	/*
	 * Optionally find copies from parents' files.
	 */

and some section elsewhere with the functions being called here.  It's
not like there will be significant intersection between the
"same-file-and-rename" code and the "move-in-file or
copy-from-some-other-file" code.  It's just a separate piece of code,
marked with a separate comment.  Not an "evolution" of code, but rather
a simple addition.  Basically each of those additions does "try blaming
some more entries to some source different from the currently considered
target" and whatever remains at the end is blamed on the current target.
And every such addition has its own algorithm and functions.

At any rate: I'll just propose the thing as one piece first and then we
can still discuss what kind of subdivision and/or
commenting/restructuring might be warranted and would simplify
reviewing.

-- 
David Kastrup

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html