Re: Revision walking, commit dates, slop

Jakub Narebski <jnareb@xxxxxxxxx> · Thu, 23 May 2019 23:04:49 +0200

Derrick Stolee <stolee@xxxxxxxxx> writes:
> On 5/22/2019 2:29 PM, Jakub Narebski wrote:
>> Derrick Stolee <stolee@xxxxxxxxx> writes:
>>> On 5/20/2019 7:27 PM, Jakub Narebski wrote:
>>
>> Restating it yet again:
>> 
>>    A.  corrected_date(C) = max(committer_date(C),
>>                                max_P(committer_date(P) + offset(P)) + 1)
>> 
>>    B.  offset(C) = max(corrected_date(C) - committer_date(C),
>>                        max_P(offset(P)) + 1)
>
> The problem with this definition is that it "defines" the corrected date, and
> then _adjusts_ it by updating the offset.

For me equation (A) was just an intermediate step; we might call it
adjusted_date(C) or temp_date(C) instead.

Note that with the following adjustment

          corrected_date(C) = max(committer_date(C),
                                  max_P(corrected_date(P)) + 1)

it is +1 corrected "V3: Corrected Commit Date" from gen-test.

> I consider
>
> 	corrected_date(C) = committer_date(C) + offset(C)
>
> to be part of the definition. You could restate the definition as follows:
>
> 	corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
>         	             max_P(corrected_date(P)))
>
> or, equivalently
>
> 	corrected_date = max(committer_date(C) + max_P(offset(P)) + 1,
>         	             max_P(committer_date(P) + offset(P)))
>
> This definition, in a single step, satisfies the conditions below:
>
>> 
>>> The final definition needs two conditions on the offset of a commit C for
>>> every parent P:
>>>
>>>  1. committer_date(C) + offset(C) > committer_date(P) + offset(P)
>>>  2. offset(C) > offset(P)

I think it is easier to prove the conditions (1) and (2) using two-step
definition (A) + (B), as I have shown in previous email.

Also, what we need to calculate and store is offset(C), not
offset_date(C) (i.e. corrected_date(C), if you prefer).

> Plus, the "+ 1" in the first step takes into account that "0" is a special offset
> value in the commit-graph file format meaning "not computed".

That assumes that max_P(function(P)) over empty set of P is taken to be
zero.  Also, I think two step definition has the same property.

>> Well, we should check/test if performance benefits of "offset date"
>> ("corrected date with rising offset") truly holds.
>
> Yes, a full performance test will be required. I have full confidence that the
> monotonic offset requirement will have only positive effect. That is, it will
> not affect the case where committer-date was better than generation number, but will
> help the cases where all the committer-dates are equal.

I worry that monotonic offset corrected date would behave like
topological levels (i.e. like current generation number v1) in more
cases rather than like commit date heuristics.

P.S. there is _theoretical_ problem with all date-offset based
generation numbers (slightly more likely to occur for monotonical
offset), namely that in the commit-graph format v1 we have 34 bits for
commit date timestamp, but only 30 bits for offset; and there might be
the case where offset do not fit in 30 bits.  It would be however very
unlikely, e.g. commit date of 2028, then commit date of 1970 (timestamp
equal to zero).

Regards,
--
Jakub Narębski