(Excuse the <pedantic> mode, the initial summary is for historical purposes.) <pedantic> The Fedora Documentation Project's (FDP) goal is to "create easy-to- follow, task-based documentation for Fedora Core users and developers."[1] The FDP is part of the Fedora Project. The Fedora Project's overall goal is to "build a complete, general purpose operating system exclusively from open source software."[2] For a few days, some of the Fedora Documentation Project folks have been contemplating using an XML normalization utility to clean up XML into a standardized presentation before a CVS commit happens. The clean up would include, but not be limited to, the following: * set standard fill-column (the columnar position after which unprotected text is wrapped) * set indentation size * set block/inline tag vertical spacing While performing the cleanup procedure, any utility used must also protect the DTD block, any CDATA containers, and any similar containers such as <screen>, <programlisting>, and <literal>. It must be configurable in such a way that changes to the configuration can be provided cleanly via CVS. Clients would perform this procedure as part of a "make" target before committing changes. The details on this part have yet to be worked out, but we would certainly try to make it as painless as possible -- possibly even simply making it a prerequisite to any other constructive "make" target. Without this step, we are running a risk of generating a lot more white noise in CVS and on the fedora-docs-commits list. In 2003, with a smaller, less visible project, the use of Emacs/PSGML was simply *required*, more or less, so normalization was enforced on the client side without any additional fuss. With more participants, however, we have to confront the fact that people want to use their own favorite tools. XML normalization makes cooperation on the same document possible for writers and editors who enjoy different tools, by ensuring that CVS diffs are sensible. </pedantic> The "tidy" utility is GPL and in Fedora Extras, and it will do some XML cleanup, but it is not designed for this purpose. It was designed as an HTML normalization engine, and simply has some XML functionality. The "xmlformat" utility is designed from the ground up as an XML normalizer, but it is *not* GPL. Thankfully, Tommy Reynolds brought the xmlformat licensing specification to my attention last week, so I've had a little time to think about it. The xmlformat utility is still open source software; although IANAL, I did a pretty thorough review of the licensing of xmlformat and other open source software requirements, and this seems pretty clear-cut. Note that the "open source" requirement *does not* mean the software has to be GPL, or BSD. It merely needs to meet the requirements and definition of "open source."[3] The terms are clear-cut enough that we may not need an official legal opinion from Mark Webbink, but I am willing to put a link to this message at the appropriate wiki location for him to look at if anyone thinks it's necessary. Here are the facts of licensing pertaining to xmlformat: (A) The original portions of xmlformat by Paul DuBois, paul at kitebird com, are licensed under a BSD-style license.[4] The BSD-style license is an open source license. The only portion of xmlformat not covered by this license is the implementation of the REX shallow parser. (B) The REX shallow parser, which is copyrighted by Robert D. Cameron, cameron at sfu ca, is licensed under terms shown below in their entirety: "The following code may be freely used and distributed provided that this copyright and citation notice remains intact and that modifications or additions are clearly identified."[5] The REX shallow parser clearly meets the requirements of the Open Source Definition, to wit: 1. No royalties or fees are imposed upon redistribution of REX. The license specifically and categorically permits free distribution without additional restrictions. 2. The source code for REX is publicly available. 3. The copyright holder for REX allows modifications or derivative works. The licensing terms require these be clearly identified, but it puts no restrictions on their creation. In addition, the terms explicitly permit free use of the material. 4. The license for REX does not set out any requirements for the licensing of modified versions, other than to require the modifications be identified as such. This requirement would allow modified versions to be distributed as the original version plus patches, as doing so would clearly identify modifications and thus meet the requirement. 5. The license for REX does not discriminate against persons or groups, nor against fields of endeavor. 6. The only requirements of the licensing terms flow through to and with any redistributed versions of REX, that is, the requirements for the copyright and citation notices to remain intact, and for modifications to be identified. 7. The license applies to the entire REX implementation. There are no subordinate parts or components as such. 8. The license does not restrict other software, and in fact REX can be distributed with other software. The xmlformat utility is itself an example of this use. Because the REX software, and thus xmlformat, clearly meet all the requirements of the open source definition, we should be able to use it in our toolchain without incurring any difficulty. (REX probably also meets the definition of "free software," although it is not copylefted and thus does not share the same distinction as GPL software.) I'll prepare an RPM of this package and see about getting it into Fedora Extras. In the meantime, we can keep testing and evaluating other methods of XML normalization. So far, xmlformat does the best job that I've seen, but I'm sure there must be other tools out there. Does anyone know whether Expat could easily do what we are trying to accomplish, or am I talking apples to oranges? = = = = = [1] http://fedora.redhat.com/projects/docs/ [2] http://fedora.redhat.com/about/ [3] http://opensource.org/docs/definition.php [4] http://www.kitebird.com/software/xmlformat/ [5] http://www.cs.sfu.ca/~cameron/REX.html#AppA -- Paul W. Frields, RHCE http://paul.frields.org/ gpg fingerprint: 3DA6 A0AC 6D58 FEC4 0233 5906 ACDB C937 BD11 3717 Fedora Documentation Project: http://fedora.redhat.com/projects/docs/
Attachment:
signature.asc
Description: This is a digitally signed message part
-- fedora-docs-list@xxxxxxxxxx To unsubscribe: http://www.redhat.com/mailman/listinfo/fedora-docs-list