On Wed, Dec 23, 2015 at 6:00 AM, Scott Neville <scott.neville@xxxxxxxxxxxxxxxxxxxxxxx> wrote: > We have a database that for some reason has started to be really > slow at planning all queries. The database is running version > 9.4.2 since July 28th (it was freshly installed then - compiled > from source). http://www.postgresql.org/support/versioning/ 9.4.2 contains bugs that can eat your data and leave you with a corrupted (and possibly unusable) database without warning. You should always be ready to recover from your backups, but it's worth taking extra care about your backups when you choose to run with such serious known bugs. > The response time is fairly sporadic, but the > quickest plan time I have seen (on any query) using explain analyze > is 39ms with an execution time of 1ms, however we have slow query > logging on and we are getting queries taking over 6000 ms in the > planning stage with then only a few ms to execute. There is > nothing complex about the queries so even something like this: > > select max(datetime) from audit; > > (where datetime is an indexed field takes 200ms to plan and 0.5ms > to execute). > > The databases are involved in a replication chain so I have > > M1 -> S1 -> S2 > > I have restarted S2 and S1 and this appears to have made the > problem go away (though for how long....). S1 has a replication > slot listed on M1. > > The only other thing to note is that while all of the tables are > big, but most of them are not crazy (the one that is most commonly > selected from has 718,000 rows in it), there are some very big > tables which are reaching 325,000,000 rows. There is quite a lot > of change too I estimate about 7,500,000 row changes a day on > average, but this is also very focused (about 7 million of the > changes happen on two tables, yet all tables suffer from slow query > planning). Most of these changes occur overnight where bulk > changes occur then the rest happens in a more steady stream through > the day. I could understand it more if the execution time was > slow, but its the planning time. > > Auto-vacuum is turned on and set to 1 worker, in addition to this > we have a process that runs every night and runs "vacuum analyze" > on as many tables as it can in a 2 hour period (starting with the > oldest vacuumed first). > > Just wondering if anyone has any thoughts as to why planning > takes so long and anything I can do to address the issue. The most likely explanation, based on the above, is that your vacuum/analyze regimen is not sufficient to keep up with the modifications. You don't give a description of the hardware or show most of your configuration settings (max_connections would be particularly interesting), so you may well have other problems; but I would start by setting all autovacuum settings to their defaults except for these overrides and running a VACUUM ANALYZE command with a superuser login (even if it takes days) while you let the other load run: autovacuum_max_workers = 10 autovacuum_cost_limit = 1000 autovacuum_work_mem = '1GB' Once the VACUUM ANALYZE command completes, autovacuum stands a good chance of keeping up. If you see all 10 autovacuum processes busy for more than an hour or two at at time, you might want to increase autovacuum_cost_limit. Once you have a vacuum regimen that is keeping up, you may want to run a query to show bloat levels and take extreme measures such as VACUUM FULL to fix that; but that is pretty pointless without having a process to prevent a recurrence of the bloat. If you are still having problems, please read this and post the suggested information: https://wiki.postgresql.org/wiki/Guide_to_reporting_problems -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin