Let us start with an existing graphics format, for inspiration if nothing else.
OK.
The format I chose is PNG, because it is arguably the best existing lossless portable graphics format available.
Well, I would argue that TIFF has the "crown"...
However, PNG is an excellent standard, regardless.
4 capable of representing trees and graphs
Trees, yes - for things like layers. But why a graph??
5 recoverable from corruption 6 fast random access of data 9 fast loads and saves 10 compact
Good goals, but not a requirements. Perhaps you should separate those two things out...
And I can think of other goals that I'd like to see:
* incremental update just update a single layer w/o rewriting the whole file! * rich metadata (this may be your 7, but needs to be spelled out)
PNG certainly supports 1,2,6,7,9,10, and 11. Let us examine the other issues in more detail.
I would argue that PNG doesn't do 7 - it has no native support for CMYK, for example. (but yes, it does RGB, Gray and indexed).
And for comparison, I would offer that TIFF does the same list and REALLY does 7, including CMYK, Lab, ICC and Spot color spaces. It's extensibility is similar to PNG (in fact, PNG's chunks were modelled on TIFF chunks).
A pure XML format, by way of comparison, would fulfill requirements 1,2,3,4,7, and 8.
I'd add 9, just being in XML doesn't mean it can't be fast.
Requirement 5 in practice would be difficult to fulfill in a pure XML format without hand-hacking, which is beyound the skills of most users. A zlib-style compression step could make some progress towards 10.
But gzipping the entire XML block would then pretty make 6 impossible unless you want to seriously increase in-memory requirements.
An archive with XML metadata and png graphical data, on the other hand, would satisfy requirements 1,2,3,4,7,8, and 11.
An archive (zip, tar, ar) with XML metadata plus raster image data (ie. my previous proposal) would meet 1,2,3,4,6,7,8,10,11. 5 & 10 are related to the archive format of choice since some are better at these than others. But yes, I suspect that it would probably be a bit slower.
Requirement 6 is fulfilled for simple images, but for more complex images XML does not scale well, since every bite from the begining of the XML file to the place in which the data you are interested in is.
But the XML is just a "catalog" of what's in the archive (at least in my proposal). So you read the catalog up front and then use it to quickly find the part of the archive you want and viola - fast random access to data.
It seems like all we have to do is combine the strengths of PNG and the strengths of XML to create a format that satisfies our requirements. What we really need is not an extensible text markup language, but an extensible graphics markup format.
That's what TIFF and PNG were designed for.
Portable XCF would use a chunk system similar to PNG, with two major differences. First, chunk type would be a string instead of a 32-bit value. Second, chunks can contain an arbitrary number of subchunks, which of course can contain subchunks themselves.
I think sub-chunks is a bad idea. Although a common way to represent hierarchical relationship, they can also put overhead on random access and also slow down read/write under certain conditions.
At the end of each chunk is a checksum, as well as a close-chunk marker. The purpose of the close-chunk marker is to help recover in case of corruption; if no corruption is detected, the close-chunk marker is ignored.
This is a common technique in many file formats for corruption detection. It works.
One of the major advantages of this hybred technique is that if an implementation does not understand or is not interested in a particular chunk, it can seek to the next chunk without having to read or parse any of the data in-between.
How does it do that? How do you find "start of chunk" without a catalog? How do you get random access to a particular chunk w/o a catalog?
image data chunks should use png-style adaptive predictive compression. They should also use adam-7.
Great - but that's not specific to a file format - we can do that anywhere...
Leonard -- --------------------------------------------------------------------------- Leonard Rosenthol <mailto:leonardr@xxxxxxxxxxxxx> <http://www.lazerware.com>