Re: How do you best store structured data in git repositories?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 02, 2009 at 04:17:10PM -0500, Avery Pennarun wrote:
> On Wed, Dec 2, 2009 at 4:08 PM, Sebastian Setzer
> <sebastianspublicaddress@xxxxxxxxxxxxxx> wrote:
> > Do you store everything in a single file and configure git to use
> > special diff- and merge-tools?
> > Do you use XML for this purpose?
> 
> XML is terrible for most data storage purposes.  Data exchange, maybe,
> but IMHO the best thing you can do when you get XML data is to put it
> in some other format ASAP.

I agree 100%.

JSON's not too bad for data structures and is known to
be friendly to XML expats.

http://json.org/


> That said, however, you should still try to make your files as stable
> as possible, because:
> 
> - If your program outputs the data in random order, it's just being
> sloppy anyway
> 
> - 'git diff' doesn't work usefully otherwise (for examining the data
> and debugging)


If you were using Python + simplejson then using something
like the sort_keys=True flag would ensure that your data
is stable as the dictionaries keys will always appear in a
deterministic order.

Since I mentioned JSON and git in the same email then I might as
well also mention an old UGFWIINI candidate:

http://www.ordecon.com/2009/04/22/is-git-more-than-just-a-version-control-system/


Lastly, BERT might not be a good choice for storing inside
of a git repository, but it is a nice format for representing
data structures:

http://github.com/blog/531-introducing-bert-and-bert-rpc


We've been using git for tracking changes to a large set of
JSON files at $dayjob and it's worked out pretty well.

I'd suggest that you try to break your data up into multiple
files if possible.  As someone else mentioned, it's often
easier to diff and merge stuff if you structure things in a
merge-friendly way.

One feature that we've implemented is file referencing
where data can "#include" another data file.  That is
the kind of thing that can make things easier on you if
you foresee having a lot of common data that can be
shared amongst the various different files.

-- 
		David
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]