On Fri, Aug 13, 2010 at 1:52 AM, Craig Ringer <craig@xxxxxxxxxxxxxxxxxxxxx> wrote: > Hi folks > > I know many people here loathe ORM systems. I'm one of them, but I still > use them when they appear to be appropriate, despite their problems. > > In the process I've come to realize that ORMs in general have a couple > of issues that could be avoided with some help from the database. > Specifically: > > - They need to fetch graphs of records that keep records associated > with peers in relationships. Essentially they want nested sets. > Ideally they need to be able to do this with a WHERE or > LIMIT/OFFSET on the "root" relation, so they can do batch > fetching of blocks of records in contexts where holding a > transaction open (permitting cursor use) isn't appropriate. > > Currently ORMs do this by doing multiple LEFT OUTER JOINs and > post-processing the results to eliminate duplication of data. > Needless to say this is incredibly inefficient. It also makes > using LIMIT/OFFSET nigh impossible, so they often fetch the whole > data set into local memory (!!) even if the app only asks for a > small fragment of it. > > A native way to fetch a query's results as a nested set, as > (say) XML or JSON, would potentially be a huge bonus if ORM > systems could be convinced to use it. It's potentially possible > already with use of nested subqueries and array_agg. > > I've even been idly playing with the idea of using > PL/Java to build a Java object graph in memory and send that > to the client! > > I'm wondering if anyone's been down the path of building a > more ORM-friendly relation graph representation database-side > and sending it to the client. > > > - ORMs tend to lack cache coherence. They generally maintain a cache > of records fetched from the database - partly because their fetching > is so expensive (as noted above) and partly just as a general > performance optimisation. The problem is that the database has no > way to invalidate the ORM's cache of a particular record when changes > are made to it in the database, so the ORM's cache tends to get out > of sync with the database. > <snip/> We do not use an ORM, but we do do very generalized graphs of metadata in the database, with the end results very similar to what you describe. I've written about this a bit on this list and won't repeat myself other than to say once more, that we use the set / subset model popularized by Joe Celko. We decompose the graphs into trees and use in-order tree traversal with individual tree nodes sharing a common FK to identify them in the forest. A single tree is a single flat query. Our largest individual trees run to 60,000 nodes or so. Graphs are built and managed in memory. We do cache because, at the moment, the construction of the metadata that is pointed to by these trees is expensive (long story, unrelated to the issues you're talking about). However, there is a task to optimize that and I suspect that we won't need caching once we're done (though we may keep it since it is in place and working). All tree management is through the same code that creates the in memory graphs in the first place so cache invalidation is not an issue. In the end we have a sort of ORM, however, we can add new tables to our schema (with some fairly strict limitations) and define the metadata to manage the CRUD operations on it all at run time. Is this applicable in general? Probably but the code is not trivial... Could we benefit from hooks into the database for this? Perhaps, datatypes that explicitly knew they had the set / subset relationship to each other could allow moving big chunks (if not all) of the tree management out of the Java code and into some form of operation within the database. Thinking out loud here, I guess that could be generally useful, it seems you'd end up with a sort of ltree but I think to move the generalization this far into the database means that the tree nodes are managing (possibly poorly defined) FK relationships. That's probably ok for this kind of stuff, there are ways to get the equivalent of strong typing back either on the DB side or at run time. You're essentially end up hacking a relational database to support network database type operations, so to the extent that people need that you've got something useful... -- Peter Hunsberger -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general