Re: [PATCH] userdiff: add Julia to supported userdiff languages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday, January 10, 2020 11:43 AM, Johannes Sixt <j6t@xxxxxxxx> wrote:

> Am 10.01.20 um 04:10 schrieb Ryan Zoeller via GitGitGadget:
>
> > Add xfuncname and word_regex patterns for Julia1,
> > which is a language used in numerical analysis and
> > computational science.
> > The default behavior for xfuncname did not allow
> > functions to be indented, nor functions to have a
> > macro applied, such as @inline or @generated.
> >
> > Signed-off-by: Ryan Zoeller rtzoeller@xxxxxxxxxxxxx
> >
> > ----------------------------------------------------
> >
> > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-521%2Frtzoeller%2Fjulia_userdiff-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-521/rtzoeller/julia_userdiff-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/521
> > Documentation/gitattributes.txt | 2 ++
> > t/t4018-diff-funcname.sh | 1 +
> > t/t4018/julia-function | 5 +++++
> > t/t4018/julia-indented-function | 8 ++++++++
> > t/t4018/julia-inline-function | 5 +++++
> > t/t4018/julia-macro | 5 +++++
> > t/t4018/julia-mutable-struct | 5 +++++
> > t/t4018/julia-struct | 5 +++++
> > userdiff.c | 15 +++++++++++++++
> > 9 files changed, 51 insertions(+)
> > create mode 100644 t/t4018/julia-function
> > create mode 100644 t/t4018/julia-indented-function
> > create mode 100644 t/t4018/julia-inline-function
> > create mode 100644 t/t4018/julia-macro
> > create mode 100644 t/t4018/julia-mutable-struct
> > create mode 100644 t/t4018/julia-struct
>
> The tests all look good.
>
> > diff --git a/userdiff.c b/userdiff.c
> > index efbe05e5a5..b5e938b1c2 100644
> > --- a/userdiff.c
> > +++ b/userdiff.c
> > @@ -79,6 +79,21 @@ PATTERNS("java",
> > "|[-+0-9.e]+[fFlL]?|0[xXbB]?[0-9a-fA-F]+[lL]?"
> > "|[-+*/<>%&^|=!]="
> > "|--|\\+\\+|<<=?|>>>?=?|&&|\\|\\|"),
> > +PATTERNS("julia",
> >
> > -   "^[ \t](((mutable[ \t]+)?struct|(@.+[ \t])?function|macro)[ \t].)$",
>
> Looks good to me.
>
> > -   /* -- */
> > -   /* Binary literals */
> > -   "[-+]?0b[01]+"
> > -   /* Hexadecimal literals */
> > -   "|[-+]?0x[0-9a-fA-F]+"
>
> These two could be merged into
>
> /* Binary and hexadecimal literals */
> "|0[bx][0-9a-fA-F]+"

I was trying to avoid `0b` being followed by hex characters from being recognized, e.g. 0bFF. This is admittedly not really a concern, so I'm fine making this change to simplify the regular expression.

>
> Note that I did not insert [-+]? at the front. Even though most if not
> all patterns allow a sign, they are usually wrong to do so, because they
> misclassify a change from 'a+1' to 'a+2' as 'a[-+1-]{++2+}' instead of
> the correct 'a+[-1-]{+2+}'.

I'm fine dropping the leading `[-+]?`.

>
> > -   /* Real and complex literals */
> > -   "|[-+0-9.e_(im)]+"
>
> I am curious: is '(1+2i)' a single literal -- including the parentheses?
> The expression would also mistake the character sequence '-1)+(2+' as a
> single word; is it intended?

This part of the regular expression has a pretty major mistake due to me misunderstanding how the parentheses were being interpreted. It should be something along the lines of `([-+0-9.e_]|im)+`.

Julia uses `im` as the designation for an imaginary value; this regex was intended to admit e.g. 1+2im, in addition other numeric values such as 1_000_000 and 1e10.

>
> > -   /* Should theoretically allow Unicode characters as part of
> > -   -   a word, such as U+2211. However, Julia reserves most of the
> > -   -   U+2200-U+22FF range (as well as others) as user-defined operators,
> > -   -   therefore they are not handled in this regex. */
> > -   "|[a-zA-Z_][a-zA-Z0-9_!]*"
> > -   "|--|\\+\\+|<<=?|>>>=?|>>=?|\\\\\\\\=?|//=?|&&|\\|\\||::|->|[-+*/<>%^&|=!$]=?"),
>
> The last sub-expression permits single-character operators in addition
> to their forms with a '=' appended (computing assignment, I presume).
> You could remove the trailing ? because single non-whitespace characters
> are always a word of their own, even if they are not caught by the word
> regexp.

Agreed, I'll drop the trailing ?.

>
> > PATTERNS("matlab",
> > /*
> > * Octave pattern is mostly the same as matlab, except that '%%%' and
> > base-commit: 042ed3e048af08014487d19196984347e3be7d1c
>
> -- Hannes

Thanks for the feedback,
Ryan Zoeller





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux