Re: [RFC/PATCH] diff: funcname and word patterns for perl

Jonathan Nieder <jrnieder@xxxxxxxxx> · Sun, 26 Dec 2010 05:22:04 -0600

Thomas Rast wrote:

> I just took the laziest (and most obvious) approach possible when I
> wrote the original patterns.  I think the second most laziest one
> would be to observe that bit patterns for leading characters are
> always 11.., while those for continuation chars are 10..
> 
> So that gives
> 
>   |[\xc0-\xff][\x80-\xbf]+

Yes, that's what I was thinking of.  v2 will be a two-part series
starting with that.

BTW, the perl token matcher is pretty half-hearted.  In part this is
because "only perl can parse perl" [1] terrifies me and in part it is
because I am too lazy to write down the state machine implied by
PPI/Token/*.pm.

If some tokenization wizard would like to work on it, something like
the following might produce more pleasant word diffs:

	"[%&$][[:space:]]*[0-9]+"	/* $1 */
	"|[%&$][[:space:]]*([[:alpha:]_']|::)([[:alnum:]_']|::)*"	/* $var1 */
	"|[%&$][[:space:]]*\\$([[:alnum:]_]|::)([[:alnum:]_']|::)*"	/* $$var1 */
	"|[%&$][[:space:]]*\\$\\{"     /* $${ introducing complicated expression */
	"|[%&$][[:space:]]*\\$\\$"     /* $$$ introducing complicated expression */
	"|[%&$][[:space:]]*[^[:alnum:]_:'^$]"	/* $! */
	"|[%&$][[:space:]]*\\^[][A-Z\\^_?]"	/* $^A */
	"|[%&$][[:space:]]*\\{\\^[][A-Z\\^_?]\\}"	/* ${^A} */
	"|[%&$][[:space:]]*\\{\\^[][A-Z\\^_?][[:alnum:]_]*\\}" /* ${^Foo} */
	/* ${var} */
	"|[%&$][[:space:]]*\\{[[:space:]]*([[:alpha:]_']|::)[[:alnum:]_:]*[[:space:]]\\}"
	"|[%&$][[:space:]]*\\{"	/* ${ introducing complicated expression */
	...

though it is an unmaintainable mess. :)

[1] perl::toke.c and http://www.perlmonks.org/?node_id=44722
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html