Re: DTD

Peter Flynn <peter@xxxxxxxxxxx> · Tue, 3 Apr 2018 09:36:27 +0100

On 02/04/18 23:06, Lawrence D'Oliveiro wrote:
> On Mon, 2 Apr 2018 22:02:41 +0100, Peter Flynn wrote:
> 
>> On 01/04/18 22:10, Lawrence D'Oliveiro wrote:
>>
>>> The content for these is *not* “just strings”.  
>>
>> But only a human can know that; it is probably documented somewhere
>> else.
> 
> Fontconfig knows it too.

Then it should be in the DTD. Maybe the DTD was not written to show
this. Do we know how the document analysis was done?

A DTD is meant to provide testable constraints on the formation of
documents, to stop an errant author or editor putting a heading in the
middle of a paragraph, for example; and to guarantee to an application
that the data is already syntactically valid, so the application doesn't
need to waste time and space checking things that are known to be
correct. You can map a DTD or Schema to a C or Java class, so your
application can just swallow the data in one gulp.

If the DTD is not doing that, then it should be updated to do so.

>> The DTD says the content is "parsed character data", which is
>> text with no further element markup; in effect "just strings", so
>> that's it as far as SGML or XML is concerned if you use a DTD alone.
> 
> Precisely why a DTD is useless.

Possibly this DTD is. I still don't know, because no-one will tell me
what elements like int are supposed to contain. Does anyone actually
have information on this?

I think the designer may not have understood that elements are for text.
To store categorical and other non-text data, XML provides attributes
which can be (loosely) data-typed (W3C Schemas provide tighter data typing).

>> A W3C Schema can constrain character data content more finely, and
>> Schematron can apply additional validation rules.
> 
> Is it better to replace the DTD with a “Schematron”, then?

You would need both: Schematron is a separate constraint-checking
language that works in cooperation with either a DTD or a W3C Schema.
Overkill for this application, I would have thought.

It depends on how critical data typing is, and who is creating the files.

EXAMPLE:

   I have an application using a DTD what needs to accept and
   store dates in ISO format (YYYY-MM-DD). DTDs don't have a
   "date" data type because they are for storing text, not
   data (W3C Schemas provide a "date" data type, but they are
   many orders of magnitude more complex than DTDs, and overkill
   for my application. So I created a compulsory attribute
   actually called "YYYY-MM-DD" so now when someone creates a
   date, it pops up a prompt for "YYYY-MM-DD" which is self-
   explanatory enough that in 25 years we haven't had anyone
   put in a bogus date. It's "good enough" for this application.

>>> Given that it will accept files that are not valid Fontconfig
>>> configurations, the validation function seems useless.  
>>
>> What is "it" in this context? Fontconfig? Or the DTD? Maybe the DTD
>> has been written incorrectly.
> 
> But you have just admitted that there is no way to write the DTD
> correctly.

No, precisely the opposite: you can *always* write the DTD (or W3C
Schema) correctly. You just have to provide the relevant information.
But this DTD seems to have been written using a different set of
assumptions (not uncommon: a lot of people think XML is some kind of
programming language).

Perhaps I wasn't clear enough in my question: what is int (for example)
supposed to contain? Letters? Numbers? Digits only (the name implies
"integer")? The name of an internal function? A string testable by a
regexp? Or something else unexplained?

Do you have examples or documentation of valid Fontconfig XML configs
which demonstrate what all the elements are for? So far no-one seems to
have this information.

> Or maybe it should simply continue using what it already uses, a
> language called C, which exactly accepts valid Fontconfig configs, no
> more and no less

But it only does that because you programmed it to. Do you have examples
of valid Fontconfig configs? Enough to come up with a proper grammar for
them? Storing configs in a programming language would work, but it
probably wouldn't be testable by the creator independently of the
application, so the application would have to do all the work of parsing
and validation. With valid XML, the data is guaranteed correct before
you start.

A C compiler on its own doesn't DO anything: you have to feed it some
code written in error-free C syntax. Feeding it broken C code will
result in errors.

A validating XML parser on its own doesn't do anything either: you have
to feed it a document written in XML syntax that conforms to a DTD or
Schema. Feeding it broken code will result in errors.

I think someone may have misunderstood what XML is for.

XML is a language to describe the structure and content of TEXT
documents, like books, articles, manuals, reports, journals, letters,
legal/literary/historical documents, manuscripts, web pages, etc. It's
used to control document formation, so I can fire up my XML editor
(Emacs, in my case), tell it to create a "fontconfig" document of the
type specified by "fonts.dtd" (that's what the DOCTYPE line is for), and
it will configure itself to edit that type of document. I can then add
elements and text where needed, validate the document to ensure I
haven't made any booboos, and then save it, safe in the knowledge that
it is guaranteed valid XML and will be accepted as such by any XML tool
in the world.

It is POSSIBLE to use it to describe DATA instead (numeric, categorical,
and string information in tabular format) but this was not what it was
primarily designed for.  A W3C Schema adds data types and other bells
and whistles to allow much finer-grained control over exactly what data
types are valid in what locations in the document.

(Actually, a lot of data nowadays seems to be held in JSON instead, but
that assumes you are programming in Java :-)

Data normally goes in data files. There are numerous formats for config
files which work fine, but creating them is prone to finger-slippage and
misunderstandings over what should go where. XML is just a file format
for which good editing facilities exist so that people can create the
files without errors; and for which parsers for programming languages
exist so that the application can open the file and know that all the
data is already syntactically valid (no need to check anything).

If the application doesn't need to allow individuals to create
Fontconfig files in this way, use a different file format.

P
_______________________________________________
Fontconfig mailing list
Fontconfig@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/fontconfig