Re: [Gimp-developer] GimpCon RFC: Portable XCF

Leonard Rosenthol <leonardr@xxxxxxxxxxxxx> · Sat, 09 Aug 2003 16:24:16 -0000

At 6:01 PM -0700 8/8/03, Nathan Carl Summers wrote:
Let us start with an existing graphics format, for inspiration if nothing
else.

	OK.

The format I chose is PNG, because it is arguably the best existing
lossless portable graphics format available.

	Well, I would argue that TIFF has the "crown"...

	However, PNG is an excellent standard, regardless.

4 capable of representing trees and graphs

	Trees, yes - for things like layers.   But why a graph??

5 recoverable from corruption
6 fast random access of data
9 fast loads and saves
10 compact

	Good goals, but not a requirements.  Perhaps you should 
separate those two things out...

	And I can think of other goals that I'd like to see:

* incremental update
	just update a single layer w/o rewriting the whole file!
* rich metadata
	(this may be your 7, but needs to be spelled out)

PNG certainly supports 1,2,6,7,9,10, and 11.  Let us examine the other
issues in more detail.

	I would argue that PNG doesn't do 7 - it has no native 
support for CMYK, for example.  (but yes, it does RGB,  Gray and 
indexed).

	And for comparison, I would offer that TIFF does the same 
list and REALLY does 7, including CMYK, Lab, ICC and Spot color 
spaces.   It's extensibility is similar to PNG (in fact, PNG's chunks 
were modelled on TIFF chunks).

A pure XML format, by way of comparison, would fulfill requirements
1,2,3,4,7, and 8.

	I'd add 9, just being in XML doesn't mean it can't be fast.

 Requirement 5 in practice would be difficult to fulfill
in a pure XML format without hand-hacking, which is beyound the skills of
most users.  A zlib-style compression step could make some progress
towards 10.

	But gzipping the entire XML block would then pretty make 6 
impossible unless you want to seriously increase in-memory 
requirements.

An archive with XML metadata and png graphical data, on the other hand,
would satisfy requirements 1,2,3,4,7,8, and 11.

	An archive (zip, tar, ar) with XML metadata plus raster image 
data (ie. my previous proposal) would meet 1,2,3,4,6,7,8,10,11.   5 & 
10 are related to the archive format of choice since some are better 
at these than others.  But yes, I suspect that it would probably be a 
bit slower.

Requirement 6 is
fulfilled for simple images, but for more complex images XML does not
scale well, since every bite from the begining of the XML file to the
place in which the data you are interested in is.

	But the XML is just a "catalog" of what's in the archive (at 
least in my proposal).  So you read the catalog up front and then use 
it to quickly find the part of the archive you want and viola - fast 
random access to data.

It seems like all we have to do is combine the strengths of PNG and the
strengths of XML to create a format that satisfies our requirements.  What
we really need is not an extensible text markup language, but an
extensible graphics markup format.

	That's what TIFF and PNG were designed for.

Portable XCF would use a chunk system similar to PNG, with two major
differences.  First, chunk type would be a string instead of a 32-bit
value.  Second, chunks can contain an arbitrary number of subchunks, which
of course can contain subchunks themselves.

	I think sub-chunks is a bad idea.  Although a common way to 
represent hierarchical relationship, they can also put overhead on 
random access and also slow down read/write under certain conditions.

At the end of each chunk is a checksum, as well as a close-chunk marker.
The purpose of the close-chunk marker is to help recover in case of
corruption; if no corruption is detected, the close-chunk marker is
ignored.

	This is a common technique in many file formats for 
corruption detection.  It works.

One of the major advantages of this hybred technique is that if an
implementation does not understand or is not interested in a particular
chunk, it can seek to the next chunk without having to read or parse any
of the data in-between.

	How does it do that?  How do you find "start of chunk" 
without a catalog?  How do you get random access to a particular 
chunk w/o a catalog?

image data chunks should use png-style adaptive predictive compression.
They should also use adam-7.

	Great - but that's not specific to a file format - we can do 
that anywhere...

Leonard
--
---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:leonardr@xxxxxxxxxxxxx>
                     			     <http://www.lazerware.com>