Good to get some high-quality feedback. On Sat, 9 Aug 2003, Leonard Rosenthol wrote: > At 6:01 PM -0700 8/8/03, Nathan Carl Summers wrote: > >Let us start with an existing graphics format, for inspiration if nothing > >else. > > OK. > > > >The format I chose is PNG, because it is arguably the best existing > >lossless portable graphics format available. > > Well, I would argue that TIFF has the "crown"... > > However, PNG is an excellent standard, regardless. Good point. It can't hurt to take a look at several graphics formats and take the best parts from each of them. > >4 capable of representing trees and graphs > > Trees, yes - for things like layers. But why a graph?? GEGL supports graphs. If we use GEGL graphs, we'll need a representation ;) > >5 recoverable from corruption > >6 fast random access of data > >9 fast loads and saves > >10 compact > > Good goals, but not a requirements. Perhaps you should > separate those two things out... I see fast loads as an absolute requirement. Being compact is nice as well, because not everyone has 3 terrabyte harddrives and a T3 line into their house. Hopefully, GIMP's file handling will improve to the point where it will load thing on an as-needed basis. Therefore, fast random access is necessary. A VIPS-like demand-driven pipeline would increase gimp responsiveness a lot. > And I can think of other goals that I'd like to see: > > * incremental update > just update a single layer w/o rewriting the whole file! This seems like an excellent goal. It seems like you are suggesting a database-like format. > * rich metadata > (this may be your 7, but needs to be spelled out) Well, that was what I meant by extensibility and the ablity to represent anything GIMP can. I agree that this is important. > >PNG certainly supports 1,2,6,7,9,10, and 11. Let us examine the other > >issues in more detail. > > I would argue that PNG doesn't do 7 - it has no native > support for CMYK, for example. (but yes, it does RGB, Gray and > indexed). > > > And for comparison, I would offer that TIFF does the same > list and REALLY does 7, including CMYK, Lab, ICC and Spot color > spaces. It's extensibility is similar to PNG (in fact, PNG's chunks > were modelled on TIFF chunks). Indeed. > >A pure XML format, by way of comparison, would fulfill requirements > >1,2,3,4,7, and 8. > > I'd add 9, just being in XML doesn't mean it can't be fast. I guess if you used raw image data instead of base64 or something similar > > Requirement 5 in practice would be difficult to fulfill > >in a pure XML format without hand-hacking, which is beyound the skills of > >most users. A zlib-style compression step could make some progress > >towards 10. > > But gzipping the entire XML block would then pretty make 6 > impossible unless you want to seriously increase in-memory > requirements. right. > >An archive with XML metadata and png graphical data, on the other hand, > >would satisfy requirements 1,2,3,4,7,8, and 11. > > An archive (zip, tar, ar) with XML metadata plus raster image > data (ie. my previous proposal) would meet 1,2,3,4,6,7,8,10,11. 5 & > 10 are related to the archive format of choice since some are better > at these than others. But yes, I suspect that it would probably be a > bit slower. > > > > >Requirement 6 is > >fulfilled for simple images, but for more complex images XML does not > >scale well, since every bite from the begining of the XML file to the > >place in which the data you are interested in is. > > But the XML is just a "catalog" of what's in the archive (at > least in my proposal). So you read the catalog up front and then use > it to quickly find the part of the archive you want and viola - fast > random access to data. > > > >It seems like all we have to do is combine the strengths of PNG and the > >strengths of XML to create a format that satisfies our requirements. What > >we really need is not an extensible text markup language, but an > >extensible graphics markup format. > > That's what TIFF and PNG were designed for. > > > >Portable XCF would use a chunk system similar to PNG, with two major > >differences. First, chunk type would be a string instead of a 32-bit > >value. Second, chunks can contain an arbitrary number of subchunks, which > >of course can contain subchunks themselves. > > I think sub-chunks is a bad idea. Although a common way to > represent hierarchical relationship, they can also put overhead on > random access and also slow down read/write under certain conditions. How about a TIFF-like directory chunk at the beginning (except hierarchical)? > >At the end of each chunk is a checksum, as well as a close-chunk marker. > >The purpose of the close-chunk marker is to help recover in case of > >corruption; if no corruption is detected, the close-chunk marker is > >ignored. > > This is a common technique in many file formats for > corruption detection. It works. > > > >One of the major advantages of this hybred technique is that if an > >implementation does not understand or is not interested in a particular > >chunk, it can seek to the next chunk without having to read or parse any > >of the data in-between. > > How does it do that? How do you find "start of chunk" > without a catalog? How do you get random access to a particular > chunk w/o a catalog? It traverses the file in a linked-list style. But you are right that a directory block would be even faster. > >image data chunks should use png-style adaptive predictive compression. > >They should also use adam-7. > > Great - but that's not specific to a file format - we can do > that anywhere... Indeed we can. Rockwalrus