Search Postgresql Archives

Re: Research and EAV models

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 23 Oct 2009, Leif B. Kristensen wrote:

I'm a researcher type, and I've made an EAV model that suits me well in my genealogy research. How can you associate an essentially unknown number of sundry "events" to a "person" without an EAV model?

CREATE TABLE events(person text,key text,value text);

You seem to think there's some sort of fundamental difference between the two models. There isn't. The tuples of a standard relational row are simply a fixed set of key/value pairs, but you can always store EAV data explicitly like this simple example. Similarly, you can always decompose relational data into an equivalent EAV set, where the keys are the column names.

The difference between the two data models is that knowing the keys in advance allows you to do optimizations when looking up the data that result in faster queries than any really generic system can do. Trying to optimize queries against huge EAV sets will just end up reinventing concepts like indexes if you want them to run well, and then you're back to having to specify which are the important keys/columns in advance.

Your thinking this is a business/research distinction isn't right, it comes down to the size of the data set and how fast/expressive the queries against them are. In the research realm, you can watch this struggle play in things like bioinformatics, where there's giant stacks of very similar data they need to sort through. What often ends up happening is you have scientists start with a simple EAV system, then watch it completely fall down under real-world load and complicated queries once it gets past prototype. Then they try to patch it for a while by reinventing SQL query and storage concepts (badly).

Eventually someone with traditional database background comes along, notes that the data you really need is for the most part predetermined, rewrites that into something more amenable to standard SQL, and then the whole thing performs better for that subset. But now you've got stuff like ALTER TABLE to add a column every time you want to track something new, and people don't like that when the schema is still fluid. So they start putting stuff into EAV stores for their next project...and the cycle begins anew.

Ultimately you can be really flexible in how your store your data, or you can get good query performance, but it's quite hard to do both.

--
* Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux