On Tue, Mar 21, 2017 at 4:24 PM, James Parks <james.parks@xxxxxxxxxx> wrote: > ... and here's how long it takes to read all of the rows: > database=> select max(an unindexed bigint column) from a; > -[ RECORD 1 ]-------- > max | <some number> > Time: 10624.368 ms > > Running this another time immediately afterward (to show the cached speed) > returns: > Time: 13782.363 ms > > If I go to a separate database cluster that has an equivalent schema, and > roughly equivalent table a (+- 2% on the number of rows), the above queries > look more like this: > > meraki_shard_production=> vacuum verbose a; > INFO: vacuuming "public.a" > INFO: index "a_pkey" now contains 42171 row versions in 162 pages > DETAIL: 0 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "a": found 487 removable, 42286 nonremovable row versions in 7809 out > of 7853 pages > DETAIL: 373 dead row versions cannot be removed yet. > There were 42436 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.02u sec elapsed 0.01 sec. > INFO: vacuuming "pg_toast.pg_toast_19037" > INFO: index "pg_toast_19037_index" now contains 57 row versions in 2 pages > DETAIL: 0 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "pg_toast_19037": found 0 removable, 57 nonremovable row versions in > 12 out of 12 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 0 unused item pointers. > 0 pages are entirely empty. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > VACUUM > Time: 32.890 ms > > database=> select max(the same unindexed bigint column) from a; > max > ----------------- > <some number> > (1 row) > Time: 16.696 ms > (The second iteration takes 15.320 ms) > > So, the way I see it, my problem boils down to table "A" getting roughly > 100-1000x slower when it gets roughly 20-50x bigger (depending if you > measure in pages or tuples). Unfortunately, in my use case, table "A" acts > as a join table for a lot of aspects of our company's webapp. Every 10 > minutes, the table is queried for 35 million rows via sequential scan (~800 > seq scans per minute, ~1.3 per second on average), and 6.5 million rows via > index lookup. When a sequential scan over 40k rows takes less than 1 second, > everything is fine -- when it takes 10+ seconds the database starts to slow > down significantly. Thankfully, queries can share sequential scans, but you > can imagine how the responsiveness of the webapp might suffer as a > consequence. There's also the secondary effect that, should the query on B > complete, there now exist many queries against A (and other related tables) > that are slow enough to potentially increase the size of A even further. It > is not uncommon for queries involving A to start taking upwards of 30 > minutes to complete, when they usually complete in roughly 300ms, after some > maintenance query against B has completed. > > Our go-to solution has been to detect and stop these maintenance queries if > they take too long, and then to CLUSTER table A. This puts a cap on how long > any maintenance query can take -- down to somewhere around 1 hour. > > And thus my query to you guys: > > What can I do to keep running long maintenance operations on large tables > (SELECTing significant fractions of B, DELETEing significant fractions of B, > running VACUUM FULL on B) without denying other Postgresql backends their > ability to efficiently query table A? Or, in other words, how do I avoid > incurring the cost of transaction isolation for queries against B on a > case-by-case basis? > > Anything is on the table for implementation: > - moving tables to a different database / cluster / completely different > DBMS system > - designing an extension to tune either sets of queries > - partitioning tables > - etc > ... although the simpler the better. If you were in this position, what > would you do? > > Regards, > James You're experiencing bloat because the transaction on B is preventing the xid horizon from moving forward, thus dead tuples from A cannot be reclaimed in case the transaction on B decides to query them. There's only one "easy" solution for this as far as I know, and it is to run your long-running queries on a hot standby. That certainly works for most read-only workloads, especially pg_dump. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance