Re: EXPLAIN (no ANALYZE) taking an hour for INSERT FROM SELECT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





>Hi, thanks for your follow-up questions.
>- postgres version is 9.1.13
>- the number of rows (in this latest instance) is 28,474,842
>- I've clustered and vacuum-full-ed and analyzed this table frequently, attempting to troubleshoot this. (Running vacuum full on the whole catalog seems a little excessive, and unlikely to help.)

Hi,

I mean the pg_* tables. When working with temp objects and a high number of classes, regular vacuum may not clean them efficiently.
It is not a bad idea to run a vacuum full verbose manually on the largest of those from time to time to verify that they don't grow outer control.
And this normally requires a few seconds only.
The verbose output of vacuum full sometimes returns interesting information...
For the ANALYZE performance, I guess that these are the most relevant one:
 pg_statistic;
 pg_class;
 pg_attribute;
 pg_index;
 pg_constraint;
 
regards,

Marc Mamin



>- no other processes are likely to be interfering; nothing other than PostgreSQL runs on this machine (except for normal OS processes and New Relic server monitoring service); concurrent activity in PostgreSQL is low-level and unrelated, and this effect is observed systematically whenever this kind of operation is performed on this table
>- no override for this table; the system default_statistics_target is 100 (the default)
>- yes, an ANALYZE is performed on the temp table after the COPY and before the INSERT
>- no index on the temp table (but I'm scanning the whole thing anyway). There are indexes on f_foo as detailed in my original post, and I expect the PK to make the WHERE NOT EXISTS filtering efficient (as it filters on exactly all columns of the PK) ... but even if it didn't, I would expect that to only slow down the actual insert execution, not the EXPLAIN.
>Cheers,
>Gulli
>On Wed, Mar 4, 2015 at 8:10 PM, Marc Mamin <M.Mamin@xxxxxxxxxxxx> wrote:
>
>    >Hi,
>    >we are seeing cases of EXPLAIN INSERT INTO foo SELECT ... taking over an hour, with disk I/O utilization (percent of time device is busy) at 100% the whole time, although I/O bandwidth is not saturated. This is on PostgreSQL 9.1.13.
>    >What could cause this? Note that there is no ANALYZE. Is it possible that the SELECT is actually executed, in planning the INSERT?
>    >When executing the INSERT itself (not EXPLAIN) immediately afterwards, that logs a "temporary file" message, but the EXPLAIN invocation does not (though the disk I/O suggests that a large on-disk sort is occurring):
>    >LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp6016.0", size 744103936
>    >STATEMENT:  INSERT INTO f_foo SELECT
>    >[...]
>    >During that actual execution, there's a lower disk I/O utilization (though a higher MB/s rate).
>    >Charts of disk I/O utilization and rate are at http://postimg.org/image/628h6jmn3/ ... the solid 100% span is the EXPLAIN statement, ending at 6:13:30pm, followed by the INSERT statement ending at 6:32pm. Metrics are collected by New Relic; their definition of I/O utilization is at https://discuss.newrelic.com/t/disk-i-o-metrics/2900
>    >Here's the EXPLAIN statement:
>    >LOG:  duration: 3928778.823 ms  statement: EXPLAIN INSERT INTO f_foo SELECT
>    >       t_foo.fk_d1,
>    >       t_foo.fk_d2,
>    >       t_foo.fk_d3,
>    >       t_foo.fk_d4,
>    >       t_foo.fk_d5,
>    >       t_foo.fk_d6,
>    >       t_foo.value
>    >FROM t_foo
>    >WHERE NOT (EXISTS
>    >             (SELECT *
>    >              FROM f_foo
>    >              WHERE f_foo.fk_d2 = t_foo.fk_d2
>    >                AND f_foo.fk_d3 = t_foo.fk_d3
>    >                AND f_foo.fk_d4 = t_foo.fk_d4
>    >                AND f_foo.fk_d5 = t_foo.fk_d5
>    >                AND f_foo.fk_d6 = t_foo.fk_d6
>    >                AND f_foo.fk_d1 = t_foo.fk_d1))
>    >(where t_foo is a temp table previously populated using COPY, and the NOT EXISTS subquery refers to the same table we are inserting into)
>    >Here's the EXPLAIN output:
>    >Insert on f_foo  (cost=8098210.50..9354519.69 rows=1 width=16)
>    >  ->  Merge Anti Join  (cost=8098210.50..9354519.69 rows=1 width=16)
>    >        Merge Cond: ((t_foo.fk_d2 = public.f_foo.fk_d2) AND
>    >                     (t_foo.fk_d3 = public.f_foo.fk_d3) AND
>    >                     (t_foo.fk_d4 = public.f_foo.fk_d4) AND
>    >                     (t_foo.fk_d5 = public.f_foo.fk_d5) AND
>    >                     (t_foo.fk_d6 = public.f_foo.fk_d6) AND
>    >                     (t_foo.fk_d1 = public.f_foo.fk_d1))
>    >        ->  Sort  (cost=3981372.25..4052850.70 rows=28591380 width=16)
>    >              Sort Key: t_foo.fk_d2, t_foo.fk_d3, t_foo.fk_d4, t_foo.fk_d5,
>    >                        t_foo.fk_d6, t_foo.fk_d1
>    >              ->  Seq Scan on t_foo  (cost=0.00..440461.80 rows=28591380
>    >                                      width=16)
>    >        ->  Sort  (cost=4116838.25..4188025.36 rows=28474842 width=12)
>    >              Sort Key: public.f_foo.fk_d2, public.f_foo.fk_d3,
>    >                        public.f_foo.fk_d4, public.f_foo.fk_d5,
>    >                        public.f_foo.fk_d6, public.f_foo.fk_d1
>    >              ->  Seq Scan on f_foo  (cost=0.00..591199.42 rows=28474842
>    >                                      width=12)
>    >The INSERT is indeed rather large (which is why we're issuing an EXPLAIN ahead of it to log the plan). So its long execution time is expected. But I want to understand why the EXPLAIN takes even longer.
>    >The table looks like this:
>    >\d f_foo
>    >Table "public.f_foo"
>    > Column |   Type   | Modifiers
>    >--------+----------+-----------
>    > fk_d1  | smallint | not null
>    > fk_d2  | smallint | not null
>    > fk_d3  | smallint | not null
>    > fk_d4  | smallint | not null
>    > fk_d5  | smallint | not null
>    > fk_d6  | smallint | not null
>    > value  | integer  |
>    >Indexes:
>    >    "f_foo_pkey" PRIMARY KEY, btree (fk_d2, fk_d6, fk_d4, fk_d3, fk_d5, fk_d1) CLUSTER
>    >    "ix_f_foo_d4" btree (fk_d4)
>    >    "ix_f_foo_d3" btree (fk_d3)
>    >    "ix_f_foo_d5" btree (fk_d5)
>    >    "ix_f_foo_d6" btree (fk_d6)
>    >Foreign-key constraints:
>    >    "f_foo_d2_fkey" FOREIGN KEY (fk_d2) REFERENCES d2(id) DEFERRABLE
>    >    "f_foo_d6_fkey" FOREIGN KEY (fk_d6) REFERENCES d6(id) DEFERRABLE
>    >    "f_foo_d5_fkey" FOREIGN KEY (fk_d5) REFERENCES d5(id) DEFERRABLE
>    >    "f_foo_d4_fkey" FOREIGN KEY (fk_d4) REFERENCES d4(id) DEFERRABLE
>    >    "f_foo_d3_fkey" FOREIGN KEY (fk_d3) REFERENCES d3(id) DEFERRABLE
>    >Conceivably relevant (though I don't know how): this database has a very large number of table objects (1.3 million rows in pg_class). But other EXPLAINs are not taking anywhere near this long in this DB; the heavy EXPLAIN is only seen on INSERT into this and a couple of other tables with tens of millions of rows.
>    >Any ideas?
>    >Thanks, best regards,
>    >- Gulli
>    >
>    Hi,
>    I've no clue for the time required by EXPLAIN
>    but some more information are probably relevant to find an explanation:
>
>    - postgres version
>    - number of rows inserted by the query
>    - how clean is your catalog in regard to vacuum
>       ( can you run vacuum full verbose & analyze it, and then retry the analyze statement ?)
>    - any other process that may interfere, e.g. while locking some catalog tables ?
>    - statistic target ?
>    - is your temp table analyzed?
>    - any index on it ?
>       
>    We have about 300'000 entries in our pg_class tables, and I've never seen such an issue.   
>
>    regards,
>    Marc Mamin
>
>

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux