Re: Fwd: possible Improving diff algoritm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/12/2012 10:53 PM, Junio C Hamano wrote:
> Morten Welinder <mwelinder@xxxxxxxxx> writes:
> 
>> Is there a reason why picking among the choices in a sliding window
>> must be contents neutral?
> 
> Sorry, you might be getting at something interesting but I do not
> understand the question.  I have no idea what you mean by "contents
> neutral".
> 
> Picking between these two choices
> 
>          /**                         +    /**                         
>     +     * Default parent           +     * Default parent           
>     +     *                          +     *                          
>     +     * @var int                 +     * @var int                 
>     +     * @access protected        +     * @access protected        
>     +     * @index                   +     * @index                   
>     +     */                         +     */                         
>     +    protected $defaultParent;   +    protected $defaultParent;   
>     +                                +                                
>     +    /**                              /**                         
> 
> would not affect the correctness of the patch.  You may pick
> whatever you deem the most desirable, but your answer must be a
> correct patch (the definition of "correct" here is "applying that
> patch to the preimage produces the intended postimage").
> 
> And I think if you inserted a block of text B after a context C
> where the tail of B matches the tail of C like the above, you can
> shift what you treat as "inserted" up and still come up with a
> correct patch.

I have the feeling that a few crude heuristics would go a long way
towards improving diffs like this.  For example:

* Prefer to have an add/remove block that has balanced begin/end pairs
(where begin/end pairs might be opening and closing parentheses,
brackets, braces, and angle brackets, "/*" and "*/", and perhaps a
couple of other things.  For SGML-like text begin and end tags could be
matched up.

It would be possible to read these begin/end pairs from a
filetype-specific table or configuration setting, though this would add
complication and would also make it possible that diffs generated by two
different people are not identical if their configurations differ.

* Prefer to have a block where the first non-blank line of the block and
the first non-blank line after the block are indented by the same amount.

* Prefer to have a block with trailing (as opposed to leading or
embedded) blank lines--the more the better.

The beautiful thing is that even if the heuristics sometimes fail, the
correctness of the patch (in the sense that you have defined) is not
compromised.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]