Re: [BUG-ish] diff compaction heuristic false positive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 10, 2016 at 9:25 AM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
> On Fri, Jun 10, 2016 at 8:56 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Jeff King <peff@xxxxxxxx> writes:
>>
>>> On Fri, Jun 10, 2016 at 03:50:43AM -0400, Jeff King wrote:
>>>
>>>> I found a false positive with the new compaction heuristic in v2.9:
>>>> [...]
>>>
>>> And by the way, this is less "hey neat, I found a case" and more "wow,
>>> this is a lot worse than I thought".
>>>
>>> I diffed the old and new output for the top 10,000 commits in this
>>> particular ruby code base. There were 45 commits with changed diffs.
>>> Spot-checking them manually, a little over 1/3 of them featured this bad
>>> pattern. The others looked like strict improvements.
>>>
>>> That's a lot worse than the outcomes we saw on other code bases earlier.
>>> 1/3 bad is still a net improvement, so I dunno. Is this worth worrying
>>> about? Should we bring back the documentation for the knob to disable
>>> it? Should we consider making it tunable via gitattributes?
>>>
>>> I don't think that last one really helps; the good cases _and_ the bad
>>> ones are both in ruby code (though certainly the C code we looked at
>>> earlier was all good).
>>>
>>> It may also be possible to make it Just Work by using extra information
>>> like indentation. I haven't thought hard enough about that to say.
>>>
>>> -Peff
>>
>> I recall saying "we'd end up being better in some and worse in
>> others" at the very beginning.  How about toggling the default back
>> for the upcoming release, keeping the experimentation knob in the
>> code, and try different heuristics like the "indentation" during the
>> next cycle?
>
> Sure. I thought about for a while now and by now I agree with Junio.
> No matter what kind of heuristic we can come up with it is easy to construct
> a counter example.
>
> That said, let's try the indentation thing, though I suspect
> one of the early motivating examples (an excerpt from a  kernel config file)
> would not do well with it, as it had not an indentation scheme as programming
> languages do.
>
> Thanks,
> Stefan

I think we could use the indentation trick and it might help in this
case. I agree, let's disable this for this cycle and experiment in the
next one. Good catch, Peff.

As others have said you will always be able to produce counter
examples, that's the nature of heuristics. The idea is to see if we
can come up with something simple that mostly improves the output,
even if sometimes it might have a negative impact on the outputs. But
I think we should avoid changing behavior unless it's mostly an
improvement.

Regards,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]