Re: [PATCH] userdiff: add support for Emacs Lisp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 16 2021, Protesilaos Stavrou wrote:

> On 2021-02-14, 19:25 +0100, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
>> On Sun, Feb 14 2021, Johannes Sixt wrote:
>>
>>> Am 14.02.21 um 02:41 schrieb Ævar Arnfjörð Bjarmason:
>>>> Just a cursory "git log -p -- lisp" in emacs.git with your patch shows
>>>> e.g. lisp/thingatpt.el where forms in a "defun" aren't indented (before
>>>> it selects the "defun", after with yours it's a "put" in the middle of
>>>> the function).
>>>
>>> Note that negative matches can be specified. We use the feature in the
>>> cpp case to exclude public:/protected:/private: and label: that happen 
>>> to be not indented. Perhaps that can be useful here?
>>>
>>> Oh, and BTW, what the patterns treat as "function" must not match what
>>> the language treats as function. The purpose of the hunk header is to 
>>> spot a place in the source file easily. So, it does not hurt if
>>> eval-and-compile forms are captured (as was mentioned in the linked 
>>> thread) if desired.
>>
>> Right, so having lots of test-case is helpful, e.g. for elisp maybe you
>> have a top-level defun, maybe not, maybe the top-level is a "(progn" so
>> you'd like a second-level more meaningful context, or maybe not...
>>
>> Obviously these userdiff patterns aren't a general parser and will
>> always be hit-and-miss, it's just useful to at least eyeball them
>> against in-the-wild test data to check their sanity & add some tests.
>
> Our intent with this patch is to rely on a heuristic that works for most
> cases.  There will always be some case that is not covered.  This point
> was made explicit in the relevant emacs-devel thread.

Yes these heuristics will always be bad in some cases. I'm just
encouraging you to find some of the more common cases, make test-cases
out of them, to say "this is what I want".

The most common difference I've seen is that by finding comments it
means for changes like this:
    
    (def foo ()
         "..."
         (some-code)
    ;; comments inside functions are often not indented
       (blah))
    
    (def bar ()
         "..."
         CHANGE_ME)

Before we'd find "foo" as the context, now we find ";; comments inside
functions are often not indented".

There's other cases where that comment rule does the right
thing. E.g. finding the description of the file at the top, but we could
also capture that with:

    ^(;;; [^.]+\.el ---.*)'

Then you'd get things like:

    ;;; button.el --- clickable buttons

Without overmatching any and all comments. Or maybe it isn't
overmatching and is better in the common case. I don't know.

Another example is now we'll match "(progn", sometimes that's "right",
sometimes "wrong". Perhaps even in cases where that's "right" the common
"(progn" pattern doesn't provide much/any extra information, and we'd be
better to skip past it with a negative rule to the next "(def" ?

I'm typing this in mu4e, but I wouldn't call myself deeply familiar with
Elisp. My reading of the linked thread / blog posts it links to is that
some other people came up with more narrow matching rules, presumably
because they found some of these cases.

So all I'm suggesting is maybe pro-actively find those cases & loop
those people in.

>> My cursory glance at the emacs.git version v.s. what's being submitted
>> here is that this one does a worse job in *very common* cases.
>>
>>>> Yours also changes it from e.g.:
>>>>      @@ -61,7 +61,7 @@ forward-thing
>>>> to:
>>>>      @@ -61,7 +61,7 @@ (defun forward-thing (thing &optional n)
>>>> Is this really desired in elisp?
>
> I think this is a matter of perspective and indeed a valid reason for
> doing this in emacs.git first.
>
> For my part, I find the verbose style more informative, especially when
> it captures the previous form instead of the one directly around the
> diffed lines.
>
> Perhaps one could add to that Emacs' ability to use the same name for
> different things, e.g. a function and a variable, where verbosity of
> this sort can help with disambiguation.  Granted, we should not really
> on such a heading style to aid us in that task, though it may be
> something to consider.

I'm not saying it needs to be done in emacs.git first, just maybe
coordinate with people who wrote/use that rule.

I think matching the whole line makes sense (sans leading whitespace),
we do it for the rest of the git.git rules.

>>> It's common practice to extract the entire line (sans indentation if
>>> applicable), not just the function name. Why would somebody want the 
>>> latter? It doesn't carry as much information as could be possible.
>>
>> Because I'm familiar with the codebase I'm editing and I just need to
>> know that the diff I'm viewing is on the "main" function, not that it's
>> "main()", "int main(int argv, char **argv)", "int main(const int argv,
>> const char *argv[])", or to either have a " {" or not at the end
>> depending on the project's coding style.
>
> Oftentimes we produce/read patches for projects that we are not
> necessarily acquainted with.  This includes emacs.git and the multitude
> of Elisp packages in that milieu.  An extra element of contextuality can
> do us good.
>
> It is true that familiarity with the code base will always benefit from
> succinct headings, though I feel that whenever we are in doubt we should
> err on the side of caution: which means that we must not introduce such
> an assumption to the workings of this piece of functionality.
>
>> I know our own userdiff builtin patterns don't do this, but it would be
>> very useful to retrofit this capability / maybe make it a configurable
>> feature, i.e. have them capture the meaningful part of the line, and you
>> could either print it all, or just that part.
>
> It would be nice to have an option that toggles verbosity.  Though I
> guess this lies outside the scope of the patch in question.

Yes I agree that this is completely outside the scope of adding elisp
support to userdiff.

I was just replying generally to Johaness on the topic of why someone
would want a pattern to match $1 over $0, as most of our git.git
patterns do (or if it's $1, it's the whole line anyway, sans leading
whitespace).

> In conclusion, I think we should decide on the next step forward: if you
> think this should be applied to emacs.git before making its way to git
> itself, then we can move the discussion there.

I think just applying a version to this to git.git is fine, and it
should not wait for whatever 20-some patches test mechanism rewrite I
started. I can rebase on top of this.

B.t.w. emacs.git also has a rule for *.texi, perhaps you'd be interested
in hacking that up too? :)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux