On Mon, Jun 29, 2009 at 01:37:31PM -0700, David Morris wrote: > 1000 years from now, it will certainly be easier to recover content from > an ascii 'file' than an html, xml, or pdf 'file' created now. It is > probably an unjustified assumption that 'software' available 1000 years > from now will be able to render today's html, xml, or pdf. I am not sure I agree with this assertion. In 1000 years, I have every hope that some versions of PDF will be widely usable; but the currently prescribed format of electronic versions of RFCs I think is already obsolete, and will be unreadable in 1000 years. PDF/A-1 is an ISO standard preferred by the U.S. Library of Congress for page-oriented textual (or primarily textual) documents "when layout and visual characteristics are more significant than logical structure." (http://www.digitalpreservation.gov/formats/fdd/fdd000125.shtml, visited 2009-06-29) One could construct a reasonable argument that in the case of RFCs, the layout and visual characteristics are _not_ more siginficant than logical structure. But under the current publication regime, they are in fact more significant: we have significant rules for publication about the exact "page" layout, the number of lines, the margins, the headers and footers, and even what "character" (i.e. line-printer character) ends a page. We have practically no guidance about the logical structure of documents, except that if the document is a given number of pages, it needs a table of contents. Whether the logical structure of the document ought to be of higher concern in relation to the publication form is a topic argued elsewhere in this thread. I want to pay attention to whether PDF will be usable in 1000 years. The Library of Congress, and librarians generally, take archival formats terribly seriously. There is just about no hope of dislodging the MARC standard, for instance, even though every librarian I ever spoke to in my admittedly brief library career granted that MARC is miserably adapted to relational databases (which hadn't been invented when MARC was settled upon). The reason MARC can't be replaced is because that's the format they picked, and so everything has to work around it. Period. The technology it was invented around was obsolete before the standard even got widely adopted? Too bad. This is an _archival_ format, and therefore it Will Not Change. All future technology will simply be specified to use it. And it is so specified: one library automation system I knew of when I last looked at this (nearly 10 years ago, mind) stored every MARC record in BLOBs, and just did everything up in the application. Everyone except the sales people thought this a miserable hack, but the MARC format was preserved. Thus do relational theorists go slowly insane. If librarians have picked PDF/A-1 as an electronic format that they're going to use -- particularly, if LC has picked it -- then I have every confidence that the format will be supported somewhere for roughly as long as there remain readers on Earth. I am more concerned, in fact, about widespread inability to read than I am about librarians stopping support of some archival format they selected. They are way more serious about keeping old archival formats working than the IETF has even been about making FTP continue to work everywhere. ASCII, on the other hand, doesn't meet any of the librarians' criteria, and never did. It is too restrictive even to deal with non-American titles in the library catalogue (e.g. books priced in pounds sterling), never mind to deal with non-English titles. ASCII was a bugbear in library automation systems from the very beginning. Certainly, files of supposedly plain text containing the occasional control character used to format pages for a specific line printer that was once attached to some ancient computer system on the Internet is not an archival format that any responsible librarian would sign up for. Best regards, A -- Andrew Sullivan ajs@xxxxxxxxxxxx Shinkuro, Inc. _______________________________________________ Ietf@xxxxxxxx https://www.ietf.org/mailman/listinfo/ietf