On Fri, Oct 23, 2009 at 5:30 PM, Greg Smith <gsmith@xxxxxxxxxxxxx> wrote: > > Your thinking this is a business/research distinction isn't right, it comes > down to the size of the data set and how fast/expressive the queries against > them are. In the research realm, you can watch this struggle play in things > like bioinformatics, where there's giant stacks of very similar data they > need to sort through. What often ends up happening is you have scientists > start with a simple EAV system, then watch it completely fall down under > real-world load and complicated queries once it gets past prototype. Then > they try to patch it for a while by reinventing SQL query and storage > concepts (badly). I've got little idea what the OP is really dealing with but there is a fundamental difference between the research world and the more traditional business world. On the research side the data models are not well known in advance, there are few established best practices and even data types are not always well known. In a conventional schema we would have about 2,000+ tables with 20,000+ columns. This is 6 year old system with 75% of our metadata having been added over the last 3 years. I expect growth to slow over the next 3 years, but a 25% growth would not be unreasonable. The current physical schema is some 40 tables with maybe some 300 columns (we also render about 6000 customized presentations of the data and manage work flow and validation with this schema). Performance tuning can be a pain, but we mostly got that mastered at this stage in the game: it's a trade off, but using a conventional schema would have consumed far more resources than we've spent over the last 6 years. The first two versions of the system were conventional schema and new research protocols where taking 6 months to a year to add. We can now add simple ones in a day or two and complex one in a couple months (the largest having some 60,000+ pieces of metadata, including validation rules and workflow). > > Eventually someone with traditional database background comes along, notes > that the data you really need is for the most part predetermined, rewrites > that into something more amenable to standard SQL, and then the whole thing > performs better for that subset. But now you've got stuff like ALTER TABLE > to add a column every time you want to track something new, and people don't > like that when the schema is still fluid. So they start putting stuff into > EAV stores for their next project...and the cycle begins anew. > > Ultimately you can be really flexible in how your store your data, or you > can get good query performance, but it's quite hard to do both. Not much argument there! However, it's a three way trade off: such a schema can get good performance for a small number of users and / or a small amount of data (we're about 300 users and 45gb total data). -- Peter Hunsberger -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general