Re: New serialization format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 5, 2012, at 7:57 AM, Michael Natterer wrote:

> And XML was ruled out because it's not the latest fad any longer?

I think this is pretty much the right answer.  There is a ton of XML hate in the world right now.

Having fought this battle when dealing with millions of lines of code, 100's of thousands of lines of JSON and/or XML, I can leave the following advice…

XML is probably the right answer here.  XML sucks in the following ways.

1. It's verbose.  This is actually good for humans, but it sucks as a wire format, and some people feel the verbosity is unreadable.  That's only true if you're able to keep all the context in your head.  Once someone screws up the indentation, or you're 1000 lines in and 12 nested levels deep, having the extra context of tag names makes a huge difference.  Also, gzip is awesome here and solves the on-disk space issues.

2. It's complex.  No argument here.  There is a lot of things is supposed to do, and a major ambiguity that people always complain about (attribute vs. elements).

3. Many of the parsers are memory hogs (tree parsers) or very slow (though that's gotten much better and doesn't apply to the parser gegl is using).  They were copying too many strings.

1 and 3 means it sucks as an on-wire format for interactive HTTP requests (though gzip pretty much negates 1).  2 means it's hard to write a fast JS parser for it, which means your HTML5 app will get slow.

Everyone says "it's more readable!" Then they try to maintain a large file, using their JSON file.  Then they discover that validation and line numbers for errors, and a more expressive grammar go a long way towards keeping programs simpler.  The first time you spend an hour trying to track down where your missing "," caused your entire file to fail to parse, you'll wish you had a better parser.  I haven't found a JSON parser that will actually spit out line numbers and context for errors.  With XML, it's easy to combine multiple grammars (think embedding GEGL ops into another XML document).  It has a validation language (two of them, in fact. yes, they have warts… but they do actually work for most things).  It's easier for new brains to look at (though slower for familiar brains).  It's more self-describing, for those who expect their file format to be produced or consumed by many other programs.  It's amazing how important strict specification can be when it comes to using a file as an interchange format.  XML is much better at this, than most other options.

Anyways, if you just expect your serialization to be temporary (like a wire format), needs to be parsed fast by a huge variety of hardware in languages without a byte array (JS), or is only produced and consumed by your own application, then JSON (or BSON, or protocol buffers) seem like a good choice.  If you're going for more of an interchange format, stick with XML.

Thus I would strongly suggest using XML for this.

Also, as far as structure goes, if you want to represent a general graph, you can draw inspiration from DOT, the language of graphviz.  There is also graphML.  You could frankly use graphML straight out of the box, though it has lots of features you're probably not interested in.

The general structure is usually:

<graph>
  .. graph attributes …
  <node />
  <node />
' <node />
  <edge />
  <edge />
  <edge />
</graph>

So you don't try to put a tree in the text at all.  IT's just a list of nodes and edges.

--
Daniel

_______________________________________________
gegl-developer-list mailing list
gegl-developer-list@xxxxxxxxx
https://mail.gnome.org/mailman/listinfo/gegl-developer-list



[Index of Archives]     [Yosemite News]     [Yosemite Photos]     [gtk]     [GIMP Users]     [KDE]     [Gimp's Home]     [Gimp on Windows]     [Steve's Art]

  Powered by Linux