Re: Optimizing around retained tuples

Claudio Freire <klaussfreire@xxxxxxxxx> · Tue, 21 Mar 2017 22:56:34 -0300

On Tue, Mar 21, 2017 at 4:24 PM, James Parks <james.parks@xxxxxxxxxx> wrote:
> ... and here's how long it takes to read all of the rows:
> database=> select max(an unindexed bigint column) from a;
> -[ RECORD 1 ]--------
> max | <some number>
> Time: 10624.368 ms
>
> Running this another time immediately afterward (to show the cached speed)
> returns:
> Time: 13782.363 ms
>
> If I go to a separate database cluster that has an equivalent schema, and
> roughly equivalent table a (+- 2% on the number of rows), the above queries
> look more like this:
>
> meraki_shard_production=> vacuum verbose a;
> INFO:  vacuuming "public.a"
> INFO:  index "a_pkey" now contains 42171 row versions in 162 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "a": found 487 removable, 42286 nonremovable row versions in 7809 out
> of 7853 pages
> DETAIL:  373 dead row versions cannot be removed yet.
> There were 42436 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.02u sec elapsed 0.01 sec.
> INFO:  vacuuming "pg_toast.pg_toast_19037"
> INFO:  index "pg_toast_19037_index" now contains 57 row versions in 2 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "pg_toast_19037": found 0 removable, 57 nonremovable row versions in
> 12 out of 12 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 0 unused item pointers.
> 0 pages are entirely empty.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> VACUUM
> Time: 32.890 ms
>
> database=> select max(the same unindexed bigint column) from a;
>        max
> -----------------
>  <some number>
> (1 row)
> Time: 16.696 ms
> (The second iteration takes 15.320 ms)
>
> So, the way I see it, my problem boils down to table "A" getting roughly
> 100-1000x slower when it gets roughly 20-50x bigger (depending if you
> measure in pages or tuples). Unfortunately, in my use case, table "A" acts
> as a join table for a lot of aspects of our company's webapp. Every 10
> minutes, the table is queried for 35 million rows via sequential scan (~800
> seq scans per minute, ~1.3 per second on average), and 6.5 million rows via
> index lookup. When a sequential scan over 40k rows takes less than 1 second,
> everything is fine -- when it takes 10+ seconds the database starts to slow
> down significantly. Thankfully, queries can share sequential scans, but you
> can imagine how the responsiveness of the webapp might suffer as a
> consequence. There's also the secondary effect that, should the query on B
> complete, there now exist many queries against A (and other related tables)
> that are slow enough to potentially increase the size of A even further. It
> is not uncommon for queries involving A to start taking upwards of 30
> minutes to complete, when they usually complete in roughly 300ms, after some
> maintenance query against B has completed.
>
> Our go-to solution has been to detect and stop these maintenance queries if
> they take too long, and then to CLUSTER table A. This puts a cap on how long
> any maintenance query can take -- down to somewhere around 1 hour.
>
> And thus my query to you guys:
>
> What can I do to keep running long maintenance operations on large tables
> (SELECTing significant fractions of B, DELETEing significant fractions of B,
> running VACUUM FULL on B) without denying other Postgresql backends their
> ability to efficiently query table A? Or, in other words, how do I avoid
> incurring the cost of transaction isolation for queries against B on a
> case-by-case basis?
>
> Anything is on the table for implementation:
> - moving tables to a different database / cluster / completely different
> DBMS system
> - designing an extension to tune either sets of queries
> - partitioning tables
> - etc
> ... although the simpler the better. If you were in this position, what
> would you do?
>
> Regards,
> James

You're experiencing bloat because the transaction on B is preventing
the xid horizon from moving forward, thus dead tuples from A cannot be
reclaimed in case the transaction on B decides to query them.

There's only one "easy" solution for this as far as I know, and it is
to run your long-running queries on a hot standby. That certainly
works for most read-only workloads, especially pg_dump.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance