Re: [Gimp-developer] Portable XFC

Leonard Rosenthol <leonardr@xxxxxxxxxxxxx> · Thu, 14 Aug 2003 23:11:17 -0000

At 1:58 PM -0700 8/14/03, Nathan Carl Summers wrote:
XML is a text markup language.  If the designers thought of using it for
raster graphics, it was an afterthought at best.

	Completely agreed.  Putting image data into the XML would be bad...

The XML/archive idea is the software equivalent of
making a motorcycle by strapping a go-cart engine to the back of a
bicycle.

	Not at all!  It's actually an elegant solution to the problem 
that XML was designed around teh idea of links and NOT around a 
"single file container" model.  That's why many folks including 
OpenOffice, Adobe, etc. are using archive formats + XML as future 
data storage systems - best of both worlds.

1. Putting metadata right next to the data it describes is a Good Thing.

	"right next to" is an interesting choice of terms.  What does 
that really mean in a single file.  It's in the same file - does it 
really need to proceed or follow it IMMEDIATELY in byte order?  As 
long as they are in the same physical file, so that one doesn't go 
w/o ther other - I don't see an issue.

	But yes, separate physical files woudl be bad!

The XML "solution" arbitrarily separates human readable data from binary
data.

	As it should be.  The binary data is for computers, the human 
readable is for both humans and computers.

 No one has yet considered what is to be done about non-human
readable metadata, but I imagine it will be crammed into the archive file
some way, or Base64ed or whatever.

	I would think it might be based on size - over a certain 
threshold it goes into the archive, otherwise it's in the XML.

Either way is total lossage.

	Why???

2. Imagine a very large image with a sizeable amount of metadata.

	OK.

 The user in our example only needs to manipulate a handfull of
layers. A good way of handling this case is to not load everything into
memory.  Say that it just parses out the layer list at the start, and then
once a layer is selected and the metadata is requested, it is read in.

	OK.

With the XML proposal, the parser would have to parse through every byte
until it gets to the part it is interested in, which is inefficient.

	Parse through the XML, sure - but NOT the archive.  But the 
fact is that you would load the ENTIRE XML into memory (a DOM tree) 
at document load time - because that's your catalog of information as 
well as (though not necessary) your metadata store.

	Perhaps, and this may be where you are going, there shoudl be 
two XML files - one which is the master catalog and hierarchical 
organization system and the other which is the metadata itself.

3. None of the current suggestions for archive formats do a good job with
in-place editing.  AR can't even do random access.  Zip can do an ok job
with in-place editing, but it's messy and often no better than writing a
whole new file from scratch.  This means that a program that makes a small
change to a file, such as adding a comment, needs to read in and write a
ton of crap.

	Zip does just fine with in place editing and can also be used 
for incremental update with on-demand garbage collection.   (I know, 
I've used it that way before).

4. Implementing a reader for the XML/archive combo is unnecessarily
complex.  It involves writing a parser for the semantics and structure of
XML, a parser for the semantics and structure of the archive format, and a
parser for the semantics and structure of the combination.

	Well, the XML would be parsed by libxml (most likely) and 
read into a DOM tree which would then be used in combination with 
some archive format library to read each piece on demand as necessary.

	It has also been suggested that libgsf (the library that 
OpenOffice, AbiWord and Gnumeric all use to handle XML/archive file 
formats) might be the solution here to handle all of this.

It is true
that libraries might be found that are suitable for some of the work, but
developers of small apps will shun the extra bloat, and such libraries
might involve licensing fun.

	I am pretty sure that GNOME has all of the necessary pieces 
we'd need - of if not, something could be found.  I am pretty sure 
that if we decide on the archive/XML format, the real work would be 
in the integration.

The semantics and structure of the
combination is not a trivial aspect -- with a corrupt or buggy file, the
XML may not reflect the contents of the archive.  With an integrated
approach, this is not a concern.

	If the XML is the catalog, then inconsistant catalogs are a 
problem for ANY file format that uses them.  This is one of those 
areas where improved error handling and recovery needs to be utilized.

5. Either the individual layers will be stored as valid files in some
format, or they will be stored as raw data.  If they are stored as true
files, they will be needlessly redundant and we will be limited to
whatever limitations the data format we choose uses.  If we just store raw
data in the archive, then it's obvious that this is just a kludge around
the crappiness of binary data in XML.

	But isn't this the case with ANY file format, including your 
PNG-like design!?!?

Leonard
--
---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:leonardr@xxxxxxxxxxxxx>
                     			     <http://www.lazerware.com>