Re: DTD

Peter Flynn <peter@xxxxxxxxxxx> · Mon, 2 Apr 2018 22:02:41 +0100

On 01/04/18 22:10, Lawrence D'Oliveiro wrote:
> On Sun, 1 Apr 2018 15:01:15 +0100, Peter Flynn wrote:
> 
>> This reflects the relatively light weight of markup needed to
>> describe the application data. It's basically a set of containers, so
>> all it needs is the names: the content is just strings.
> 
> Not for things like
> 
>     <!ELEMENT int (#PCDATA)>
>     <!ELEMENT double (#PCDATA)>
> 
> The content for these is *not* “just strings”.

But only a human can know that; it is probably documented somewhere
else. The DTD says the content is "parsed character data", which is text
with no further element markup; in effect "just strings", so that's it
as far as SGML or XML is concerned if you use a DTD alone. A W3C Schema
can constrain character data content more finely, and Schematron can
apply additional validation rules.

If the documennt type designer intends the element type to contain
unmarked text with additional semantics, that's fine, but a DTD does not
provide a way to specify that. What is int supposed to contain?

> Given that it will accept files that are not valid Fontconfig
> configurations, the validation function seems useless.

What is "it" in this context? Fontconfig? Or the DTD? Maybe the DTD has
been written incorrectly.

A DTD can only be used to test validity within the constraints of what
it has been told. If there are additional constraints which cannot be
expressed in XML Declaration Syntax (DTD-speak) then Fontconfig should
be using a different language like W3C Schema or RelaxNG.

Do you have an example?

>> In either case, hardly anyone nowadays actually codes a DTD or Schema
>> by hand; most are machine-generated from a better and more
>> comprehensive language like Relax-NG or ODD.
> 
> Could you more accurately specify the Fontconfig config syntax in one of
> these?

Possibly. It depends on what you need to tell it. If we go back to the
int element type, all the DTD has been told is that it can contain any
valid characters but not element markup. Here is a valid XML document:

<?xml version="1.0"?>
<!DOCTYPE fontconfig SYSTEM "fonts.dtd">
<fontconfig>
  <config>
    <rescan>
      <int>abcde</int>
    </rescan>
  </config>
</fontconfig>

It's meaningless, but it's valid according to the DTD, and you can check
that with any validating parser. Depending on what int is supposed to
contain, a W3C Schema (or, easier to work with, a RelaxNG grammar) can
express constraints which limit what text you can put in int. But right
now, int can contain any text you want, provided there are no more
elements in it — you could stuff the entire unmarked text of Moby Dick
in there if you wanted, and a validating parser would say the file was
valid...because it is.

>From what you said earlier about invalid files, is the Fontconfig
application reading the files with a validating XML parser? Or is it
using something else?

///Peter

_______________________________________________
Fontconfig mailing list
Fontconfig@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/fontconfig