Re: Optimizing around retained tuples

Claudio Freire <klaussfreire@xxxxxxxxx> · Tue, 21 Mar 2017 22:58:58 -0300

On Tue, Mar 21, 2017 at 10:56 PM, Claudio Freire <klaussfreire@xxxxxxxxx> wrote:
> On Tue, Mar 21, 2017 at 4:24 PM, James Parks <james.parks@xxxxxxxxxx> wrote:
>> ... and here's how long it takes to read all of the rows:
>> database=> select max(an unindexed bigint column) from a;
>> -[ RECORD 1 ]--------
>> max | <some number>
>> Time: 10624.368 ms
>>
>> Running this another time immediately afterward (to show the cached speed)
>> returns:
>> Time: 13782.363 ms
>>
>> If I go to a separate database cluster that has an equivalent schema, and
>> roughly equivalent table a (+- 2% on the number of rows), the above queries
>> look more like this:
>>
>> meraki_shard_production=> vacuum verbose a;
>> INFO:  vacuuming "public.a"
>> INFO:  index "a_pkey" now contains 42171 row versions in 162 pages
>> DETAIL:  0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> INFO:  "a": found 487 removable, 42286 nonremovable row versions in 7809 out
>> of 7853 pages
>> DETAIL:  373 dead row versions cannot be removed yet.
>> There were 42436 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 0.00s/0.02u sec elapsed 0.01 sec.
>> INFO:  vacuuming "pg_toast.pg_toast_19037"
>> INFO:  index "pg_toast_19037_index" now contains 57 row versions in 2 pages
>> DETAIL:  0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> INFO:  "pg_toast_19037": found 0 removable, 57 nonremovable row versions in
>> 12 out of 12 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
>> There were 0 unused item pointers.
>> 0 pages are entirely empty.
>> CPU 0.00s/0.00u sec elapsed 0.00 sec.
>> VACUUM
>> Time: 32.890 ms
>>
>> database=> select max(the same unindexed bigint column) from a;
>>        max
>> -----------------
>>  <some number>
>> (1 row)
>> Time: 16.696 ms
>> (The second iteration takes 15.320 ms)
>>
>> So, the way I see it, my problem boils down to table "A" getting roughly
>> 100-1000x slower when it gets roughly 20-50x bigger (depending if you
>> measure in pages or tuples). Unfortunately, in my use case, table "A" acts
>> as a join table for a lot of aspects of our company's webapp. Every 10
>> minutes, the table is queried for 35 million rows via sequential scan (~800
>> seq scans per minute, ~1.3 per second on average), and 6.5 million rows via
>> index lookup. When a sequential scan over 40k rows takes less than 1 second,
>> everything is fine -- when it takes 10+ seconds the database starts to slow
>> down significantly. Thankfully, queries can share sequential scans, but you
>> can imagine how the responsiveness of the webapp might suffer as a
>> consequence. There's also the secondary effect that, should the query on B
>> complete, there now exist many queries against A (and other related tables)
>> that are slow enough to potentially increase the size of A even further. It
>> is not uncommon for queries involving A to start taking upwards of 30
>> minutes to complete, when they usually complete in roughly 300ms, after some
>> maintenance query against B has completed.
>>
>> Our go-to solution has been to detect and stop these maintenance queries if
>> they take too long, and then to CLUSTER table A. This puts a cap on how long
>> any maintenance query can take -- down to somewhere around 1 hour.
>>
>> And thus my query to you guys:
>>
>> What can I do to keep running long maintenance operations on large tables
>> (SELECTing significant fractions of B, DELETEing significant fractions of B,
>> running VACUUM FULL on B) without denying other Postgresql backends their
>> ability to efficiently query table A? Or, in other words, how do I avoid
>> incurring the cost of transaction isolation for queries against B on a
>> case-by-case basis?
>>
>> Anything is on the table for implementation:
>> - moving tables to a different database / cluster / completely different
>> DBMS system
>> - designing an extension to tune either sets of queries
>> - partitioning tables
>> - etc
>> ... although the simpler the better. If you were in this position, what
>> would you do?
>>
>> Regards,
>> James
>
> You're experiencing bloat because the transaction on B is preventing
> the xid horizon from moving forward, thus dead tuples from A cannot be
> reclaimed in case the transaction on B decides to query them.
>
> There's only one "easy" solution for this as far as I know, and it is
> to run your long-running queries on a hot standby. That certainly
> works for most read-only workloads, especially pg_dump.

Forgot to clarify... for your use case, make sure you *don't* enable
standby feedback on the standby.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance