On Sat, May 23, 2009 at 12:27 AM, Conrad Lender <crlender@xxxxxxxxx> wrote: > On 23/05/09 06:05, Rodrigo E. De León Plicet wrote: >>>> Is there a better way? >>> >>> Yeah, natural keys. >> >> +1. >> >> Also, what Ben described reeks of EAV. >> >> Ben, please read: >> >> http://joecelkothesqlapprentice.blogspot.com/2006/04/using-one-table-vs-many.html > > Ah yes, the great Celko. I had the honor of being "lectured" by him on a > similar topic, and to tell the truth, I'm still unconvinced. I had > inherited a generally well-designed database with a clear relational > structure, apart from one table about which I was uncertain. This table > collected several groups of attributes/values that were not necessarily > related to each other, and could (should?) be split into separate > tables. It was nowhere near as bad as the example in the linked article, > and we did have check constraints in place. The values were used to > classify and sometimes "tag" rows in other tables. I hesitated to break > this table up, because that would have meant > > - doubling the number of existing tables (70 to 140) > - significant code changes all over the place > - having a lot of very small tables with identical layout > "id" (surrogate key) > "entry" (string; used to refer to this row by the application) > "label" (string; visible on the front end) > > Here's an example of the value groups that were contained in the table: > > fax status: > pending, active, sent, error > department: > office, accounting, it, legal, experts > deadline type: > official, unofficial > ... > > Using the "entry" field as natural keys would have been possible in some > places, but not everywhere, and the labels still had to be editable by > the users. Some of the "entry" strings were rather verbose (>40 > characters), which would have made the other tables a lot larger. It > also didn't "feel right" to have so many tiny tables with only 2-5 rows. > > Is it really advisable to put all these values into 70 separate tables > with the exact same layout? I don't quite see the benefit. I've used EAV for some stuff, and it definitely has its place, but if the data set in an EAV tableset is large enough, it becomes unweildy quickly. For large amounts of data, it's usually better to bite the bullet and make a table for it. Since the data in EAV tables have no strong meaning in and of themself, it's often hard for another developer to grok what someone was doing with those data. However, the whole point of using natural keys is that if you need the attributes from the master table that's FK'ed to a natural key lookup table, then you don't have to actually hit any other table, just the main one. So, small to medium finite and well defined sets like states etc. are coherent, but cost little retrieve. Sure there's bound to be large less well defined sets that you don't always want, and for those a surrogate key works just fine. PostgreSQL doesn't do as well as some dbs with huge star schemas, where everything is in its own table and you always join for everything, once the number of tables gets really large. I have a great deal of respect for Celko. I don't always agree with everything he says, but most of it makes a lot of sense to me. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general