Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic

Stefan Beller <sbeller@xxxxxxxxxx> · Tue, 19 Apr 2016 00:05:56 -0700

On Tue, Apr 19, 2016 at 12:00 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote:
>
>> I am convinced the better way to do it is like this:
>>
>>     Calculate the entropy for each line and take the last line with the
>>     lowest entropy as the last line of the hunk.
>
> I'll be curious to see the results, but I think sometimes predictable
> and stupid may be the best route with these sorts of things. In
> particular, I'd worry that a content-independent measure of entropy
> might miss some subtleties of a particular language (e.g., that "*" is
> more or less meaningful than some other character). But we'll see. :)

I would assume that the "*" would have little entropy when there are lots
of comments, i.e. it just "feels" like an empty line.
If there are no "*", then the entropy is high as it is unusual. And
unusual things
should not be at the border of a hunk I would assume.
So m prediction is that the  'subtleties of a particular language' correlate
highly with the actual use of characters.

Anyway, the experiment can be carried out later. :)

Thanks,
Stefan

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html