Re: [GSoC][PATCH v2 1/1] userdiff: add support for scheme

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Atharva
On 03/04/2021 14:16, Atharva Raykar wrote:
Add a diff driver for Scheme-like languages which recognizes top level
and local `define` forms, whether it is a function definition, binding,
syntax definition or a user-defined `define-xyzzy` form.

Also supports R6RS `library` forms, `module` forms along with class and
struct declarations used in Racket (PLT Scheme).

Alternate "def" syntax such as those in Gerbil Scheme are also
supported, like defstruct, defsyntax and so on.

The rationale for picking `define` forms for the hunk headers is because
it is usually the only significant form for defining the structure of
the program, and it is a common pattern for schemers to have local
function definitions to hide their visibility, so it is not only the top
level `define`'s that are of interest. Schemers also extend the language
with macros to provide their own define forms (for example, something
like a `define-test-suite`) which is also captured in the hunk header.

Since it is common practice to extend syntax with variants of a form
like `module+`, `class*` etc, those have been supported as well.

The word regex is a best-effort attempt to conform to R6RS[1] valid
identifiers, symbols and numbers.

[1] http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4

Signed-off-by: Atharva Raykar <raykar.ath@xxxxxxxxx>
[...]
diff --git a/userdiff.c b/userdiff.c
index 3f81a2261c..ac1999bbc5 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -191,6 +191,10 @@ PATTERNS("rust",
  	 "[a-zA-Z_][a-zA-Z0-9_]*"
  	 "|[0-9][0-9_a-fA-Fiosuxz]*(\\.([0-9]*[eE][+-]?)?[0-9_fF]*)?"
  	 "|[-+*\\/<>%&^|=!:]=|<<=?|>>=?|&&|\\|\\||->|=>|\\.{2}=|\\.{3}|::"),
+PATTERNS("scheme",
+	 "^[\t ]*(\\(((define|def(struct|syntax|class|method|rules|record|proto|alias)?)[-*/ \t]|(library|module|struct|class)[*+ \t]).*)$",
+	 /* All words should be delimited by spaces or parentheses */
+	 "([^][)(}{[ \t])+"),

I think it would be nice to match single '(' and '[' to highlight when they have been added or deleted - I find this useful when I get a syntax error. Also it would be nice to handle r7rs identifiers like | this is a symbol |. Maybe something like
"(\\|([^\\\\|]*(\\\\|)*)*\\||[^][}{)( \t]|[][(){}])"

Best Wishes

Phillip

  PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
  	 "[={}\"]|[^={}\" \t]+"),
  PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux