Re: Research and EAV models

Johan Nel <johan.nel@xxxxxxxxxxxx> · Sat, 24 Oct 2009 09:37:11 +0200

Hi Peter,

I agree 100% with you.  EAV can be a good "middle of the road" appoach as 
you suggest.

Peter Hunsberger wrote:
My take on this, for the research world, is to not go pure EAV, but
rather normalize by some more generic concepts within the domain.  Eg.
"measurement", or "evaluation", etc. You might ultimately end up with
a sort of EAV model, but the "V" portion is strongly typed within the
database and you're not trying to cast a string into 20 conventional
data types. This still requires rigorous metadata management on the EA
side of the EAV model, but you can tackle that in many ways.

SQL isn't the be-all and end-all of data storage.  It does relational
stuff well, and other stuff poorly.

You can build variations on EAV that are closer to a regular
relational schema.  These don't necessarily work well or poorly but
often, at least in the research world, the middle ground is good
enough.  You are after all, talking about people who spit out MySQL
databases at the drop of a hat....

I use a very similar approach in managing meta-data, normalize the data 
that can be normalized and use EAV for the rest.  Potentially eliminating 
as much as possible text search, however in some scenarios it might be 
necessary but an additional where on some normalized columns can help a 
lot with performance.

One of my application meta-data frameworks uses only two tables to store 
all meta-data about an application and have basically the following structure:

CREATE TABLE controls (
ctrl_no SERIAL PRIMARY KEY NOT NULL,
app_id varchar(30) NOT NULL,
ctrl_type varchar(30) NOT NULL,
ctrl_id varchar(30) NOT NULL,
ctrl_property text, -- This can be also hstore to add some intelligence
CONSTRAINT controls_unique UNIQUE (app_id, ctrl_type, ctrl_id));

CREATE TABLE members (
ctrlmember_no SERIAL PRIMARY KEY NOT NULL,
ctrl_no INTEGER NOT NULL REFERENCES controls(ctrl_no),
member_no INTEGER NOT NULL REFERENCES controls(ctrl_no),
member_type varchar(30) NOT NULL,
member_property text, -- This can be a hstore to add more intelligence
CONSTRAINT member_unique UNIQUE (ctrl_no, member_no));

ctrl_property is used to store meta-data based on ctrl_type.

member_property stores meta-data based on member_type and/or overriding 
default ctrl_property values based on the parent ctrl_no it is associated 
with.

Without this approach I would have to create more than 100 tables if I 
want to fully normalize this meta-data.

Many people have indicated that I am actually duplicating many of the 
catalog features of a relational database, and I agree.  However, it 
allows me to port this meta-data onto any user specified RDBMS without 
having to worry about database specifics.

With two [recursive] queries on the above two tables I can answer most of 
the fundamental questions regarding the meta-data:

1. Show me the members (parent -> child) of a specific feature.
2. Show me the owners (child -> parent) of a specific feature.

Extending the above, it allows for an easy development plan for writing a 
generic application framework that not only manages the meta-data, but 
also allows the same framework to run/create a user interface on the fly 
by a couple of nuances on the above two queries and using the EAV 
(%_property) columns to supply all the default properties and behaviour of 
the application.

Changing the behavior of an application becomes primarily a database 
management issue, a lot less application upgrade management in a 
distributed environment.

To come back to the original message.  Yes there are a place for EAV, not 
only in Research but even in Business data.  I have a Environmental 
software scenario, and EAV on the business data provide me the edge 
against my competitors.  Lot less time needed to implement new features 
compared to doing the normal functional decomposition and system 
development life cycle.

In conclusion, I include a extract from an article by Dan Appleton 
(Datamation, 1983) that my approach is based on:

“The nature of end-user software development and maintenance will change 
radically over the next five years simply because 500 000 programmers will 
not be able to rewrite $400 billion of existing software (which is hostage 
to a seven- to 10-year life cycle).  They'll be further burdened by those 
new applications in the known backlog, as well as by those applications in 
the hidden backlog.  To solve the problem, dp (data processing) shops must 
improve productivity in generating end-user software and provide end-users 
with the means of generating their own software without creating 
anarchy...  The answer to this is a data-driven (meta-data) prototyping 
approach, and companies that do not move smoothly in this direction will 
either drown in their own information pollution or loose millions on 
systems that are late, cost too much, and atrophy too quickly.”

Regards,

Johan Nel
Pretoria, South Africa.

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general