Re: [GSOC][PATCH] userdiff: add support for Scheme

Johannes Sixt <j6t@xxxxxxxx> · Mon, 5 Apr 2021 19:58:28 +0200

Am 05.04.21 um 12:04 schrieb Phillip Wood:
> Hi Atharva
> 
> On 30/03/2021 11:22, Atharva Raykar wrote:
>>
>>
>>> On 30-Mar-2021, at 12:34, Atharva Raykar <raykar.ath@xxxxxxxxx> wrote:
>>>
>>>
>>>
>>>> On 29-Mar-2021, at 15:48, Phillip Wood <phillip.wood123@xxxxxxxxx>
>>>> wrote:
>>>>
>>>> Hi Atharva
>>>>
>>>> On 28/03/2021 13:23, Atharva Raykar wrote:
>>>>> On 28-Mar-2021, at 05:16, Johannes Sixt <j6t@xxxxxxxx> wrote:
>>>>> [...]
>>>>>>> diff --git a/t/t4018/scheme-local-define
>>>>>>> b/t/t4018/scheme-local-define
>>>>>>> new file mode 100644
>>>>>>> index 0000000000..90e75dcce8
>>>>>>> --- /dev/null
>>>>>>> +++ b/t/t4018/scheme-local-define
>>>>>>> @@ -0,0 +1,4 @@
>>>>>>> +(define (higher-order)
>>>>>>> +  (define local-function RIGHT
>>>>>>
>>>>>> ... this one, which is also indented and *is* marked as RIGHT.
>>>>> In this test case, I was explicitly testing for an indented '(define'
>>>>> whereas in the former, I was testing for the top-level
>>>>> '(define-syntax',
>>>>> which happened to have an internal define (which will inevitably
>>>>> show up
>>>>> in a lot of scheme code).
>>>>
>>>> It would be nice to include indented define forms but including them
>>>> means that any change to the body of a function is attributed to the
>>>> last internal definition rather than the actual function. For example
>>>>
>>>> (define (f arg)
>>>> (define (g x)
>>>>    (+ 1 x))
>>>>
>>>> (some-func ...)
>>>> ;;any change here will have '(define (g x)' in the hunk header, not
>>>> '(define (f arg)'
>>>
>>> The reason I went for this over the top level forms, is because
>>> I felt it was useful to see the nearest definition for internal
>>> functions that often have a lot of the actual business logic of
>>> the program (at least a lot of SICP seems to follow this pattern).
>>> The disadvantage is as you said, it might also catch trivial inner
>>> functions and the developer might lose context.
>>
>> Never mind this message, I had misunderstood the problem you were
>> trying to
>> demonstrate. I wholeheartedly agree with what you are trying to say, and
>> the indentation heuristic discussed does look interesting. I shall have a
>> glance at the RFC you linked in the other reply.
>>
>>> The disadvantage is as you said, it might also catch trivial inner
>>> functions and the developer might lose context.
>>
>> Feel free to disregard me misquoting you here. You did not say that (:
>>
>>> Another problem is it may match more trivial bindings, like:
>>>
>>> (define (some-func things)
>>>   ...
>>>   (define items '(eggs
>>>                   ham
>>>                   peanut-butter))
>>>   ...)
>>>
>>> What I have noticed *anecdotally* is that this is not common enough
>>> to be too much of a problem, and local define bindings seem to be more
>>> favoured in Racket than other Schemes, that use 'let' more often.
>>>
>>>> I don't think this can be avoided as we rely on regexs rather than
>>>> parsing the source so it is probably best to only match toplevel
>>>> defines.
>>>
>>> The other issue with only matching top level defines is that a
>>> lot of scheme programs are library definitions, something like
>>>
>>> (library
>>>     (foo bar)
>>>   (export ...)
>>>   (define ...)
>>>   (define ...)
>>>   ;; and a bunch of other definitions...
>>> )
>>>
>>> Only matching top level defines will completely ignore matching all
>>> the definitions in these files.
>>
>> That said, I still stand by the fact that only catching top level defines
>> will lead to a lot of definitions being ignored. Maybe the occasional
>> mismatch may be worth the gain in the number of function contexts being
>> detected?
> 
> I'm not sure that the mismatches will be occasional - every time you
> have an internal definition in a function the hunk header will be wrong
> when you change the main body of the function. This will affect grep
> --function-context and diff -W as well as the normal hunk headers. The
> problem is there is no way to avoid that and provide something useful in
> the library example you have above. It would be useful to find some code
> bases and diff the output of 'git log --patch' with and without the
> leading whitespace match in the function pattern to see how often this
> is a problem (i.e. when the funcnames do not match see which one is
> correct).

--function-context is just one application of the function matcher. To
work properly with nested function definitions, it would have to
understand the nesting. But it does not; there is nothing that we can do
about it without a proper language parser. Therefore, the argument that
the matcher does not work well with --function-context for nested
functions is of little relevance.

IMO, the primary concern should be whether the matcher decorates hunk
contexts sufficiently well.

-- Hannes