Re: [GSOC][PATCH] userdiff: add support for Scheme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Atharva

On 28/03/2021 12:51, Atharva Raykar wrote:
On 28-Mar-2021, at 04:20, Junio C Hamano <gitster@xxxxxxxxx> wrote:

Atharva Raykar <raykar.ath@xxxxxxxxx> writes:

+         /*
+          * Scheme allows symbol names to have any character,
+          * as long as it is not a form of a parenthesis.
+          * The spaces must be escaped.
+          */
+         "(\\.|[^][)(\\}\\{ ])+"),

One or more "dot or anything other than SP or parentheses"?  But
a dot "." is neither a space or any {bra-ce} letter, so would the
above be equivalent to

	"[^][()\\{\\} \t]+"

I wonder...

A backslash is allowed in scheme identifiers, and I erroneously thought that
the first part handles the case for identifiers such as `component\new` or
`\"id-with-quotes\"`. (I tested it with a regex engine that behaves differently
than the one git is using, my bad.)

I am also trying to figure out what you wanted to achieve by
mentioning "The spaces must be escaped.".  Did you mean something
like (string->symbol "a symbol with SP in it") is a symbol?  Even
so, I cannot quite guess the significance of that fact wrt the
regexp you added here?

I initially tried using identifiers like `space\ separated` and they
seemed to work in my REPL, but turns out space separated identifiers in
scheme do not work with backslashes, and it was working because of the way
my terminal handled escaping. Space separated identifiers are declared like
`|space separated|` and this too only seems to work with Racket, not
the other Scheme implementations.

I think the bar notation works with some other such as gambit and possibly guile (it's a while since I used the latter)

Best wishes

Phillip

 So I stand corrected here, and it's better
to drop this feature altogether.

But somehow, the regexp you suggested, ie:

	"[^][()\\{\\} \t]+"

does not handle the case of make\foo -> make\bar (it will only diff on foo).
I am not too sure why it treats backslashes as delimiters.

This seems to actually do what I was going for:

	"(\\\\|[^][)(\\}\\{ ])+"

As we are trying to catch program identifiers (symbols in scheme)
and numeric literals, treating any group of non-whitespace letters
that is delimited by one or more whitespaces as a "word" would be a
good first-order approximation, but in addition, as can be seen in
an example like (a(b(c))), parentheses can also serve as such "word
delimiters" in addition to whitespaces.  So the regexp given above
makes sense to me from that angle, especially if you do not limit
the whitespace to only SP, but include HT (\t) as well.  But was
that how you came up with the regexp?

Yes, this is exactly what I was trying to express. All words should be
delimited by either whitespace or a parenthesis, and all other special
characters should be accepted as part of the word.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux