On Thu, 18 May 2017 12:04:42 -0500 Kevin Grittner <kgrittn@xxxxxxxxx> wrote: > On Thu, May 18, 2017 at 11:07 AM, Karl O. Pinc <kop@xxxxxxxx> wrote: > > > ... Does PG > > now pay attention to database in it's SSI implementation? > > Well, it pays attention as far as the scope of each lock, but there > is only one variable to track how far back the oldest transaction ID > for a running serializable transaction goes, which is used in > cleanup of old locks. I see your point, and it might be feasible to > change that to a list or map that tracks it by database; but I don't > even have a gut feel estimate for the scale of such work without > investigating it. Just out of curiosity, what is the reason you > don't move the production and test databases to separate instances? > If nothing else, extremely long-running transaction in one database > can lead to bloat in others. Ultimately it was easier to change the transaction isolation level to repeatable read (or lower) for the transactions known to take a long time. Any concurrency issues (which have never arisen) are handled at the human level. > > Thanks for the help and apologies if I'm not framing > > the question perfectly. It's not often I think about > > this. > > No sweat -- your concern/question is perfectly clear. It's the > first time I've heard of someone with this particular issue, so at > this point I'm inclined to recommend the workaround of using a > separate cluster; but if we get other reports it might be worth > adding to the list of enhancements that SSI could use. Understood. To give you an idea of the use-case, we're using Chado (http://gmod.org/wiki/Chado) a PG database design which stores genetic information. The datasets being what they are, they are big and take a long time to load. This is especially true because the Chado designers are enamored of ontologies and knowledge representation and so there's a lot of tables where, instead of having separate columns for different types of data there's simply 2 columns "type" and "data". The type is an oncology entry and tells you want the data is. This makes for ugly queries in the process of loading data (and ugly SQL in general). So loading genetic data sets is slow. Not really an issue as there's no anticipation of loading a data set more than every 6 months or a year. (Although non-genetic data is loaded frequently.) The workflow is to load data first into the test db, possibly multiple times until satisfied. Then load the data into production. It is very handy, especially in production, to load all related data in a single transaction in the event something goes wrong. There are many non-optimal elements, not the least of which is that it's not clear how much utility there is in storing genetic datasets in a relational db along side our non-genetic data. (We are finding out.) Thanks for the help. Karl <kop@xxxxxxxx> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general