Re: How to substructure rewrites?

David Kastrup <dak@xxxxxxx> · Mon, 27 Jan 2014 17:27:38 +0100

Junio C Hamano <gitster@xxxxxxxxx> writes:

> David Kastrup <dak@xxxxxxx> writes:
>
>> As it can easily be guessed, the "add xxx function" commits are
>> basically adding not-yet-used code (and so will not disrupt
>> compilation), but everything starting with "Reorganize blame data
>> structures" up until the final commit will not work or compile since the
>> code does not match the data structures.
>>
>> So there is little point in substructing all that, right?  Even
>> something seemingly isolated like
>>
>> commit f64b41c472442ae9971321fe8f62c3885ba4d8b7
>> Author: David Kastrup <dak@xxxxxxx>
>> Date:   Sun Jan 19 02:16:21 2014 +0100
>>
>>     blame.c: Let output determine MORE_THAN_ONE_PATH more efficiently
>>
>> is not really useful as a separate commit since while it does implement
>> a particular task, this is done starting with non-working code relying
>> on no-longer existent data structures.
>
> Small pieces that are incrementally added with their own
> documentation would certainly be a lot easier to read than one big
> ball of wax.

Sure.  The problem is that my rewrite is characterized by doing as
little as possible in order to achieve identical results (with the
conceivable exception of picking a different, equally scored variant in
those parts of the algorithm choosing a maximum).  That also means that
the basic logic and layout of the program stays the same while the data
flow and parts of the data structures are replaced.

> I am wondering if it would make it easier for everybody to tentatively
> do "git-blame vs git-blame2" dance here, just like we did "git-blame
> vs git-annotate" dance some years ago.  That is, to add a completely
> new command and have them in parallel while cooking in 'next' (or we
> could even keep them in a few releases if we are not absolutely
> certain about the correctness of the result of the new code), aiming
> to eventually retire the current implementation and replace it with
> the new one.  We have already have test infrastructure to allow us to
> run variants of blames, too, to help that kind of transition.

Well, the point is that the implementation is supposed to
a) deliver identical results
b) reuse as much code as possible
so there is no real point in working with a separate source file.

For the "if we are not absolutely certain about the correctness of the
result of the new code" angle, this should be covered with the usual
stable/unstable/proposed division most projects have in some way or
another for quality assurance.  I have absolutely no clue how Git
organizes that, but it would usually mean that the new code is not
placed in a different _file_ (or a differently named command) but rather
in a different _branch_ as compared with the current implementation.

>> In general, the rule is likely "any commit should not create a
>> non-working state" right?
>
> Yes.

My current aim is to complete the code to the point where it is
a) fully operative and delivering equivalent results to the current
implementation
b) in every aspect at least as efficient as the current implementation
and in a state that is not basically less comprehensible than what I
started with

Since the change of the data structures and data flow requires changing
all affected program parts to get to a working state, and since I don't
have ambitions to do more than that which is required to get there,
I don't see how the bulk of the work can sensibly avoid coming as one
"omnibus" patch.  Most changes, however, will be understandable quite
well locally.

For example, currently the code has a number of loops traversing one
global linked list, ignoring all entries not relevant to a particular
target, and doing something with the rest.  Those loops generally are
replaced with a simpler loop just running through a single _completely_
relevant linked list.  Even while those replacements are scattered
throughout the patch, they make sense without having to look at the rest
of the patch.

-- 
David Kastrup

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html