On 02/04/18 23:06, Lawrence D'Oliveiro wrote: > On Mon, 2 Apr 2018 22:02:41 +0100, Peter Flynn wrote: > >> On 01/04/18 22:10, Lawrence D'Oliveiro wrote: >> >>> The content for these is *not* “just strings”. >> >> But only a human can know that; it is probably documented somewhere >> else. > > Fontconfig knows it too. Then it should be in the DTD. Maybe the DTD was not written to show this. Do we know how the document analysis was done? A DTD is meant to provide testable constraints on the formation of documents, to stop an errant author or editor putting a heading in the middle of a paragraph, for example; and to guarantee to an application that the data is already syntactically valid, so the application doesn't need to waste time and space checking things that are known to be correct. You can map a DTD or Schema to a C or Java class, so your application can just swallow the data in one gulp. If the DTD is not doing that, then it should be updated to do so. >> The DTD says the content is "parsed character data", which is >> text with no further element markup; in effect "just strings", so >> that's it as far as SGML or XML is concerned if you use a DTD alone. > > Precisely why a DTD is useless. Possibly this DTD is. I still don't know, because no-one will tell me what elements like int are supposed to contain. Does anyone actually have information on this? I think the designer may not have understood that elements are for text. To store categorical and other non-text data, XML provides attributes which can be (loosely) data-typed (W3C Schemas provide tighter data typing). >> A W3C Schema can constrain character data content more finely, and >> Schematron can apply additional validation rules. > > Is it better to replace the DTD with a “Schematron”, then? You would need both: Schematron is a separate constraint-checking language that works in cooperation with either a DTD or a W3C Schema. Overkill for this application, I would have thought. It depends on how critical data typing is, and who is creating the files. EXAMPLE: I have an application using a DTD what needs to accept and store dates in ISO format (YYYY-MM-DD). DTDs don't have a "date" data type because they are for storing text, not data (W3C Schemas provide a "date" data type, but they are many orders of magnitude more complex than DTDs, and overkill for my application. So I created a compulsory attribute actually called "YYYY-MM-DD" so now when someone creates a date, it pops up a prompt for "YYYY-MM-DD" which is self- explanatory enough that in 25 years we haven't had anyone put in a bogus date. It's "good enough" for this application. >>> Given that it will accept files that are not valid Fontconfig >>> configurations, the validation function seems useless. >> >> What is "it" in this context? Fontconfig? Or the DTD? Maybe the DTD >> has been written incorrectly. > > But you have just admitted that there is no way to write the DTD > correctly. No, precisely the opposite: you can *always* write the DTD (or W3C Schema) correctly. You just have to provide the relevant information. But this DTD seems to have been written using a different set of assumptions (not uncommon: a lot of people think XML is some kind of programming language). Perhaps I wasn't clear enough in my question: what is int (for example) supposed to contain? Letters? Numbers? Digits only (the name implies "integer")? The name of an internal function? A string testable by a regexp? Or something else unexplained? Do you have examples or documentation of valid Fontconfig XML configs which demonstrate what all the elements are for? So far no-one seems to have this information. > Or maybe it should simply continue using what it already uses, a > language called C, which exactly accepts valid Fontconfig configs, no > more and no less But it only does that because you programmed it to. Do you have examples of valid Fontconfig configs? Enough to come up with a proper grammar for them? Storing configs in a programming language would work, but it probably wouldn't be testable by the creator independently of the application, so the application would have to do all the work of parsing and validation. With valid XML, the data is guaranteed correct before you start. A C compiler on its own doesn't DO anything: you have to feed it some code written in error-free C syntax. Feeding it broken C code will result in errors. A validating XML parser on its own doesn't do anything either: you have to feed it a document written in XML syntax that conforms to a DTD or Schema. Feeding it broken code will result in errors. I think someone may have misunderstood what XML is for. XML is a language to describe the structure and content of TEXT documents, like books, articles, manuals, reports, journals, letters, legal/literary/historical documents, manuscripts, web pages, etc. It's used to control document formation, so I can fire up my XML editor (Emacs, in my case), tell it to create a "fontconfig" document of the type specified by "fonts.dtd" (that's what the DOCTYPE line is for), and it will configure itself to edit that type of document. I can then add elements and text where needed, validate the document to ensure I haven't made any booboos, and then save it, safe in the knowledge that it is guaranteed valid XML and will be accepted as such by any XML tool in the world. It is POSSIBLE to use it to describe DATA instead (numeric, categorical, and string information in tabular format) but this was not what it was primarily designed for. A W3C Schema adds data types and other bells and whistles to allow much finer-grained control over exactly what data types are valid in what locations in the document. (Actually, a lot of data nowadays seems to be held in JSON instead, but that assumes you are programming in Java :-) Data normally goes in data files. There are numerous formats for config files which work fine, but creating them is prone to finger-slippage and misunderstandings over what should go where. XML is just a file format for which good editing facilities exist so that people can create the files without errors; and for which parsers for programming languages exist so that the application can open the file and know that all the data is already syntactically valid (no need to check anything). If the application doesn't need to allow individuals to create Fontconfig files in this way, use a different file format. P _______________________________________________ Fontconfig mailing list Fontconfig@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/fontconfig