Re: Research and EAV models

Greg Smith <gsmith@xxxxxxxxxxxxx> · Fri, 23 Oct 2009 18:30:19 -0400 (EDT)

On Fri, 23 Oct 2009, Leif B. Kristensen wrote:

I'm a researcher type, and I've made an EAV model that suits me well in 
my genealogy research. How can you associate an essentially unknown 
number of sundry "events" to a "person" without an EAV model?

CREATE TABLE events(person text,key text,value text);

You seem to think there's some sort of fundamental difference between the 
two models.  There isn't.  The tuples of a standard relational row are 
simply a fixed set of key/value pairs, but you can always store EAV data 
explicitly like this simple example.  Similarly, you can always decompose 
relational data into an equivalent EAV set, where the keys are the column 
names.

The difference between the two data models is that knowing the keys in 
advance allows you to do optimizations when looking up the data that 
result in faster queries than any really generic system can do.  Trying to 
optimize queries against huge EAV sets will just end up reinventing 
concepts like indexes if you want them to run well, and then you're back 
to having to specify which are the important keys/columns in advance.

Your thinking this is a business/research distinction isn't right, it 
comes down to the size of the data set and how fast/expressive the queries 
against them are.  In the research realm, you can watch this struggle play 
in things like bioinformatics, where there's giant stacks of very similar 
data they need to sort through.  What often ends up happening is you have 
scientists start with a simple EAV system, then watch it completely fall 
down under real-world load and complicated queries once it gets past 
prototype.  Then they try to patch it for a while by reinventing SQL query 
and storage concepts (badly).

Eventually someone with traditional database background comes along, notes 
that the data you really need is for the most part predetermined, rewrites 
that into something more amenable to standard SQL, and then the whole 
thing performs better for that subset.  But now you've got stuff like 
ALTER TABLE to add a column every time you want to track something new, 
and people don't like that when the schema is still fluid.  So they start 
putting stuff into EAV stores for their next project...and the cycle 
begins anew.

Ultimately you can be really flexible in how your store your data, or you 
can get good query performance, but it's quite hard to do both.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general