Re: ORM integration?

Peter Hunsberger <peter.hunsberger@xxxxxxxxx> · Fri, 13 Aug 2010 10:11:22 -0500

On Fri, Aug 13, 2010 at 1:52 AM, Craig Ringer
<craig@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi folks
>
> I know many people here loathe ORM systems. I'm one of them, but I still
> use them when they appear to be appropriate, despite their problems.
>
> In the process I've come to realize that ORMs in general have a couple
> of issues that could be avoided with some help from the database.
> Specifically:
>
> - They need to fetch graphs of records that keep records associated
>  with peers in relationships. Essentially they want nested sets.
>  Ideally they need to be able to do this with a WHERE or
>  LIMIT/OFFSET on the "root" relation, so they can do batch
>  fetching of blocks of records in contexts where holding a
>  transaction open (permitting cursor use) isn't appropriate.
>
>  Currently ORMs do this by doing multiple LEFT OUTER JOINs and
>  post-processing the results to eliminate duplication of data.
>  Needless to say this is incredibly inefficient. It also makes
>  using LIMIT/OFFSET nigh impossible, so they often fetch the whole
>  data set into local memory (!!) even if the app only asks for a
>  small fragment of it.
>
>  A native way to fetch a query's results as a nested set, as
>  (say) XML or JSON, would potentially be a huge bonus if ORM
>  systems could be convinced to use it. It's potentially possible
>  already with use of nested subqueries and array_agg.
>
>  I've even been idly playing with the idea of using
>  PL/Java to build a Java object graph in memory and send that
>  to the client!
>
>  I'm wondering  if anyone's been down  the path of building a
>  more ORM-friendly relation graph representation database-side
>  and sending it to the client.
>
>
> - ORMs tend to lack cache coherence. They generally maintain a cache
>  of records fetched from the database - partly because their fetching
>  is so expensive (as noted above) and partly just as a general
>  performance optimisation. The problem is that the database has no
>  way to invalidate the ORM's cache of a particular record when changes
>  are made to it in the database, so the ORM's cache tends to get out
>  of sync with the database.
>
<snip/>

We do not use an ORM, but we do do very generalized graphs of metadata
in the database, with the end results very similar to what you
describe.  I've written about this a bit on this list and won't repeat
myself other than to say once more, that we use the set / subset model
popularized by Joe Celko.  We decompose the graphs into trees and use
in-order tree traversal with individual tree nodes sharing a common FK
to identify them in the forest.  A single tree is a single flat query.
Our largest individual trees run to 60,000 nodes or so.  Graphs are
built and managed in memory. We do cache because, at the moment, the
construction of the metadata that is pointed to by these trees is
expensive (long story, unrelated to the issues you're talking about).
However, there is a task to optimize that and I suspect that we won't
need caching once we're done (though we may keep it since it is in
place and working).  All tree management is through the same code that
creates the in memory graphs in the first place so cache invalidation
is not an issue.

In the end we have a sort of ORM, however, we can add new tables to
our schema (with some fairly strict limitations) and define the
metadata to manage the CRUD operations on it all at run time.  Is this
applicable in general?  Probably but the code is not trivial...
Could we benefit from hooks into the database for this?  Perhaps,
datatypes that explicitly knew they had the set / subset relationship
to each other could allow moving big chunks (if not all) of the tree
management out of the Java code and into some form of operation within
the database.  Thinking out loud here, I guess that could be generally
useful, it seems you'd end up with a sort of ltree but I think to move
the generalization this far into the database means that the tree
nodes are managing (possibly poorly defined) FK relationships.  That's
probably ok for this kind of stuff, there are ways to get the
equivalent of  strong typing back either on the DB side or at run
time.  You're essentially end up hacking a relational database to
support network database type operations, so to the extent that people
need that you've got something useful...

-- 
Peter Hunsberger

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general