Re: [GSOC][PATCH] userdiff: add support for Scheme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28-Mar-2021, at 04:20, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> Atharva Raykar <raykar.ath@xxxxxxxxx> writes:
> 
>> diff --git a/t/t4018/scheme-define-syntax b/t/t4018/scheme-define-syntax
>> new file mode 100644
>> index 0000000000..603b99cea4
>> --- /dev/null
>> +++ b/t/t4018/scheme-define-syntax
>> @@ -0,0 +1,8 @@
>> +(define-syntax define-test-suite RIGHT
>> +  (syntax-rules ()
>> +    ((_ suite-name (name test) ChangeMe ...)
>> +     (define suite-name
>> +       (let ((tests
>> +              `((name . ,test) ...)))
>> +         (lambda ()
>> +           (ChangeMe 'suite-name tests)))))))
>> \ No newline at end of file
> 
> Is there a good reason to leave the final line incomplete?  If there
> isn't, complete it (applies to other newly-created files in the patch).

Will do.

>> diff --git a/userdiff.c b/userdiff.c
>> index 3f81a2261c..c51a8c98ba 100644
>> --- a/userdiff.c
>> +++ b/userdiff.c
>> @@ -191,6 +191,14 @@ PATTERNS("rust",
>> 	 "[a-zA-Z_][a-zA-Z0-9_]*"
>> 	 "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
>> 	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
>> +PATTERNS("scheme",
>> +         "^[\t ]*(\\(define-?.*)$",
> 
> Didn't "git diff HEAD" before committing (or "git show") highlighted
> these whitespace errors?
> 
> .git/rebase-apply/patch:183: indent with spaces.
>         "^[\t ]*(\\(define-?.*)$",
> .git/rebase-apply/patch:184: trailing whitespace, indent with spaces.
>         /* 
> .git/rebase-apply/patch:185: indent with spaces.
>          * Scheme allows symbol names to have any character,
> .git/rebase-apply/patch:186: indent with spaces.
>          * as long as it is not a form of a parenthesis.
> .git/rebase-apply/patch:187: indent with spaces.
>          * The spaces must be escaped.
> warning: squelched 2 whitespace errors
> warning: 7 lines applied after fixing whitespace errors.

It did highlight the spaces (which I accidentally overlooked), but I
didn’t receive these warnings. It shows up with the --check flag though.
I'll recheck my configuration. Thanks for pointing this out.

> 
>> +         /* 
>> +          * Scheme allows symbol names to have any character,
>> +          * as long as it is not a form of a parenthesis.
>> +          * The spaces must be escaped.
>> +          */
>> +         "(\\.|[^][)(\\}\\{ ])+"),
> 
> One or more "dot or anything other than SP or parentheses"?  But
> a dot "." is neither a space or any {bra-ce} letter, so would the
> above be equivalent to
> 
> 	"[^][()\\{\\} \t]+"
> 
> I wonder...

A backslash is allowed in scheme identifiers, and I erroneously thought that
the first part handles the case for identifiers such as `component\new` or 
`\"id-with-quotes\"`. (I tested it with a regex engine that behaves differently
than the one git is using, my bad.)

> I am also trying to figure out what you wanted to achieve by
> mentioning "The spaces must be escaped.".  Did you mean something
> like (string->symbol "a symbol with SP in it") is a symbol?  Even
> so, I cannot quite guess the significance of that fact wrt the
> regexp you added here?

I initially tried using identifiers like `space\ separated` and they
seemed to work in my REPL, but turns out space separated identifiers in
scheme do not work with backslashes, and it was working because of the way
my terminal handled escaping. Space separated identifiers are declared like
`|space separated|` and this too only seems to work with Racket, not
the other Scheme implementations. So I stand corrected here, and it's better
to drop this feature altogether.

But somehow, the regexp you suggested, ie:

	"[^][()\\{\\} \t]+"

does not handle the case of make\foo -> make\bar (it will only diff on foo).
I am not too sure why it treats backslashes as delimiters.

This seems to actually do what I was going for:

	"(\\\\|[^][)(\\}\\{ ])+"

> As we are trying to catch program identifiers (symbols in scheme)
> and numeric literals, treating any group of non-whitespace letters
> that is delimited by one or more whitespaces as a "word" would be a
> good first-order approximation, but in addition, as can be seen in
> an example like (a(b(c))), parentheses can also serve as such "word
> delimiters" in addition to whitespaces.  So the regexp given above
> makes sense to me from that angle, especially if you do not limit
> the whitespace to only SP, but include HT (\t) as well.  But was
> that how you came up with the regexp?

Yes, this is exactly what I was trying to express. All words should be
delimited by either whitespace or a parenthesis, and all other special
characters should be accepted as part of the word.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux