Re: [PATCH v2] userdiff: support Markdown

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 30 Apr 2020 10:31:29 -0700

Ash Holland <ash@xxxxxxxxx> writes:

> It's typical to find Markdown documentation alongside source code, and
> having better context for documentation changes is useful; see also
> commit 69f9c87d4 (userdiff: add support for Fountain documents,
> 2015-07-21).
>
> The pattern is based on the CommonMark specification 0.29, section 4.2:
> https://spec.commonmark.org/
>
> Only ATX headings are supported, as detecting setext headings would
> require printing the line before a pattern matches, or matching a
> multiline pattern. The word-diff pattern is the same as the pattern for
> HTML, because many Markdown parsers accept inline HTML.

> +PATTERNS("markdown",
> +	 "^ {0,3}#{1,6}( .*)?$",

This is "possibly just a bit indented run of up to 6 hashes, either
ending the line by itself or if some text follows, there must be a
SP after the hashes".

If I had a line that has a hash, HT and then "Hello, world", would
everybody's markdown implementation reject it as a header, because
the whitespace after the run of hashes is not a SP?

Also, allowing only the hashes might be spec-compliant, but how
useful would it be to see just a sequence of 4 hashes without any
text after "@@ -100,5, +100,6 @@" in the diff output?

Taking all that together, my suspicion is

	"^ {0,3}#{1,6}[ \t]"

i.e. "possibly slightly indented run of 6 hashes, with a whitespace
to catch the headers with real contents and nothing else" might be
more practically useful.  I dunno.

> +	 "[^<>= \t]+"),

This does match the one for HTML.

In any case, let me queue this v2 as-is and see what happens.

Thanks.