Re: In progress INSERT wrecks plans on table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 26/04/13 14:56, Gavin Flower wrote:
On 26/04/13 14:33, Mark Kirkwood wrote:
Recently we encountered the following unhappy sequence of events:

1/ system running happily
2/ batch load into table begins
3/ very quickly (some) preexisting queries on said table go orders of magnitude slower
4/ database instance becomes unresponsive
5/ application outage

After looking down a few false leads, We've isolated the cause to the following:

The accumulating in-progress row changes are such that previously optimal plans are optimal no longer. Now this situation will fix itself when the next autoanalyze happens (and new plan will be chosen) - however that cannot occur until the batch load is completed and committed (approx 70 seconds). However during that time there is enough of a performance degradation for queries still using the old plan to cripple the server.

Now that we know what is happening we can work around it. But I'm wondering - is there any way (or if not should there be one) to let postgres handle this automatically? I experimented with a quick hack to src/backend/commands/analyze.c (attached) that lets another session's ANALYZE see in progress rows - which works but a) may cause other problems and b) does not help autoaanalyze which has to wait for COMMIT + stats message.

I've attached a (synthetic) test case that shows the issue, I'll reproduce the output below to hopefully make the point obvious:


               Table "public.plan"
 Column |            Type             | Modifiers
--------+-----------------------------+-----------
 id     | integer                     | not null
 typ    | integer                     | not null
 dat    | timestamp without time zone |
 val    | text                        | not null
Indexes:
    "plan_id" UNIQUE, btree (id)
    "plan_dat" btree (dat)
    "plan_typ" btree (typ)


[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------- Index Scan using plan_dat on plan (cost=0.00..265.47 rows=55 width=117) (actual time=0.130..4.409 rows=75 loops=1)
   Index Cond: (dat IS NOT NULL)
   Filter: (typ = 3)
   Rows Removed by Filter: 5960
 Total runtime: 4.440 ms
(5 rows)

[Session 2]

BEGIN;
INSERT INTO plan
SELECT id + 2000001,typ,current_date + id * '1 seconds'::interval ,val
FROM plan
;

[Session 1]
EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;

                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------- Index Scan using plan_dat on plan (cost=0.00..551.35 rows=91 width=117) (actual time=0.131..202.699 rows=75 loops=1)
   Index Cond: (dat IS NOT NULL)
   Filter: (typ = 3)
   Rows Removed by Filter: 5960
 Total runtime: 202.729 ms
(5 rows)
[Session 2]
COMMIT;

[Session 1...wait for autoanalyze to finish then]

EXPLAIN ANALYZE
SELECT * FROM plan
WHERE typ = 3 AND dat IS NOT NULL;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on plan (cost=407.87..44991.95 rows=10116 width=117) (actual time=2.692..6.582 rows=75 loops=1)
   Recheck Cond: (typ = 3)
   Filter: (dat IS NOT NULL)
   Rows Removed by Filter: 19925
-> Bitmap Index Scan on plan_typ (cost=0.00..405.34 rows=20346 width=0) (actual time=2.573..2.573 rows=20000 loops=1)
         Index Cond: (typ = 3)
 Total runtime: 6.615 ms


Regards

Mark


Hmm...

You need to specify:

1. version of Postgres
2. Operating system
3. changes to postgresql.conf
4. CPU/RAM etc
5. anything else that might be relevant




While in general you are quite correct - in the above case (particularly as I've supplied a test case) it should be pretty obvious that any moderately modern version of postgres on any supported platform will exhibit this.

I produced the above test case on Postgres 9.2.4 Ubuntu 13.04, with no changes to the default postgresql.conf


Now our actual production server is a 32 CPU box with 512GB RAM, and 16 SAS SSD running Postgres 9.2.4 on Ubuntu 12.04. And yes there are quite a few changes from the defaults there - and I wasted quite a lot of time chasing issues with high CPU and RAM, and changing various configs to see if they helped - before identifying that the issue was in progress row changes and planner statistics. Also in the "real" case with much bigger datasets the difference between the plan being optimal and it *not* being optimal is a factor of 2000x elapsed time instead of a mere 50x !

regards

Mark



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux