Re: Serializable isolation -- are predicate locks still held across all databases?

"Karl O. Pinc" <kop@xxxxxxxx> · Fri, 19 May 2017 01:52:00 -0500

On Thu, 18 May 2017 12:04:42 -0500
Kevin Grittner <kgrittn@xxxxxxxxx> wrote:

> On Thu, May 18, 2017 at 11:07 AM, Karl O. Pinc <kop@xxxxxxxx> wrote:
> 
> > ...  Does PG
> > now pay attention to database in it's SSI implementation?  
> 
> Well, it pays attention as far as the scope of each lock, but there
> is only one variable to track how far back the oldest transaction ID
> for a running serializable transaction goes, which is used in
> cleanup of old locks.  I see your point, and it might be feasible to
> change that to a list or map that tracks it by database; but I don't
> even have a gut feel estimate for the scale of such work without
> investigating it.  Just out of curiosity, what is the reason you
> don't move the production and test databases to separate instances?
> If nothing else, extremely long-running transaction in one database
> can lead to bloat in others.

Ultimately it was easier to change the transaction isolation level
to repeatable read (or lower) for the transactions known to take 
a long time.  Any concurrency issues (which have never arisen) 
are handled at the human level.

> > Thanks for the help and apologies if I'm not framing
> > the question perfectly.  It's not often I think about
> > this.  
> 
> No sweat -- your concern/question is perfectly clear.  It's the
> first time I've heard of someone with this particular issue, so at
> this point I'm inclined to recommend the workaround of using a
> separate cluster; but if we get other reports it might be worth
> adding to the list of enhancements that SSI could use.

Understood.

To give you an idea of the use-case, we're using Chado
(http://gmod.org/wiki/Chado) a PG database design
which stores genetic information.  The datasets being what
they are, they are big and take a long time to load.
This is especially true because the Chado designers
are enamored of ontologies and knowledge representation
and so there's a lot of tables where, instead of
having separate columns for different types of data
there's simply 2 columns "type" and "data".  The
type is an oncology entry and tells you want the
data is.  This makes for ugly queries in the process
of loading data (and ugly SQL in general).

So loading genetic data sets is slow.  Not really an issue
as there's no anticipation of loading a data set more
than every 6 months or a year.  (Although non-genetic
data is loaded frequently.)

The workflow is to load data first into the
test db, possibly multiple times until satisfied.
Then load the data into production.   It is very handy,
especially in production, to load all related data
in a single transaction in the event something
goes wrong.

There are many non-optimal elements, not the least
of which is that it's not clear how much utility
there is in storing genetic datasets
in a relational db along side our non-genetic data.
(We are finding out.)

Thanks for the help.

Karl <kop@xxxxxxxx>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general