Strange nested loop for an INSERT

phb07 <phb07@xxxxxxxxxxxx> · Mon, 12 Sep 2016 16:15:05 +0200



          Hi all,
           
          One of my customer has reported to me a performance
            problem when using the E-Maj extension.
          It is a tool that allows to log updates performed on
            application tables, with the capability to cancel them.
          It is based on classic triggers with a log table
            associated to each application table.
          The performance issue, encountered in very specific
            situations, is the time needed to cancel a significant
            number of insertions.
           
          I have build a simple test case that reproduces the
            problem without the need of the extension. It just mimics
            the behaviour. 
          Attached is the psql script and its result.
           
          The updates cancellation operation is done in 3 steps:
          - create a temporary table that holds each primary key to
            process
          - delete from the application table all rows that are no
            longer wished (previously inserted rows and new values of
            updated rows)
          - insert into the application table old rows we want to
            see again (previously deleted rows or old values of updated
            rows)
           
          The performance problem only concerns the third statement
            (the INSERT).
          I have run this test case in various recent postgres
            versions, from 9.1 to 9.6, with the same results.
           
          The problem appears when:
          - the application table has a primary key with a large
            number of columns (at least 7 columns in this test case)
          - and nothing but INSERT statements have been executed on
            the application table
          - and the log trigger remains active (to provide a nice
            feature: cancel the cancellation !)
           
          In the test case, I create a table and populate it with
            100,000 rows, create the log mechanism, then insert 10,000
            rows and finaly cancel these 10,000 rows insertion.
           
          The faulting INSERT statement has the following explain:
          explain analyze
          INSERT INTO t1
            SELECT
t1_log.c1,t1_log.c2,t1_log.c3,t1_log.c4,t1_log.c5,t1_log.c6,t1_log.c7,t1_log.c8
              FROM t1_log, tmp
              WHERE t1_log.c1 = tmp.c1 AND t1_log.c2 = tmp.c2 AND
            t1_log.c3 = tmp.c3
                AND t1_log.c4 = tmp.c4 AND t1_log.c5 = tmp.c5 AND
            t1_log.c6 = tmp.c6
                AND t1_log.c7 = tmp.c7
                AND t1_log.gid = tmp.gid AND t1_log.tuple = 'OLD';
                                                                                                                   
            QUERY
PLAN                                                                                                         
          
          ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
           Insert on t1  (cost=0.00..890.90 rows=1 width=32)
            (actual time=434571.193..434571.193 rows=0 loops=1)
             ->  Nested Loop  (cost=0.00..890.90 rows=1
            width=32) (actual time=434571.187..434571.187 rows=0
            loops=1)
                   Join Filter: ((t1_log.c1 = tmp.c1) AND
            (t1_log.c2 = tmp.c2) AND (t1_log.c3 = tmp.c3) AND (t1_log.c4
            = tmp.c4) AND (t1_log.c5 = tmp.c5) AND (t1_log.c6 = tmp.c6)
            AND (t1_log.c7 = tmp.c7) AND (t1_log.gid = tmp.gid))
                   Rows Removed by Join Filter: 100000000
                   ->  Index Scan using t1_log_gid_tuple_idx on
            t1_log  (cost=0.00..423.22 rows=1 width=40) (actual
            time=0.378..69.594 rows=10000 loops=1)
                         Index Cond: ((tuple)::text = 'OLD'::text)
                   ->  Seq Scan on tmp  (cost=0.00..176.17
            rows=9717 width=36) (actual time=0.006..21.678 rows=10000
            loops=10000)
           Total runtime: 434571.243 ms
          (8 rows)
           
          Time: 434572,146 ms
           
          When the conditions are not exactly met, I get:
          explain analyze
          INSERT INTO t1
            SELECT
t1_log.c1,t1_log.c2,t1_log.c3,t1_log.c4,t1_log.c5,t1_log.c6,t1_log.c7,t1_log.c8
              FROM t1_log, tmp
              WHERE t1_log.c1 = tmp.c1 AND t1_log.c2 = tmp.c2 AND
            t1_log.c3 = tmp.c3
                AND t1_log.c4 = tmp.c4 AND t1_log.c5 = tmp.c5 AND
            t1_log.c6 = tmp.c6
                AND t1_log.c7 = tmp.c7
                AND t1_log.gid = tmp.gid AND t1_log.tuple = 'OLD';
                                                                                                                  
            QUERY
PLAN                                                                                                        
          
          ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
           Insert on t1  (cost=438.65..906.34 rows=1 width=32)
            (actual time=111.526..111.526 rows=0 loops=1)
             ->  Hash Join  (cost=438.65..906.34 rows=1
            width=32) (actual time=111.521..111.521 rows=0 loops=1)
                   Hash Cond: ((tmp.c1 = t1_log.c1) AND (tmp.c2 =
            t1_log.c2) AND (tmp.c3 = t1_log.c3) AND (tmp.c4 = t1_log.c4)
            AND (tmp.c5 = t1_log.c5) AND (tmp.c6 = t1_log.c6) AND
            (tmp.c7 = t1_log.c7) AND (tmp.gid = t1_log.gid))
                   ->  Seq Scan on tmp  (cost=0.00..176.17
            rows=9717 width=36) (actual time=0.007..22.444 rows=10000
            loops=1)
                   ->  Hash  (cost=435.68..435.68 rows=99
            width=40) (actual time=58.300..58.300 rows=10000 loops=1)
                         Buckets: 1024  Batches: 1  Memory Usage:
            586kB
                         ->  Seq Scan on t1_log 
            (cost=0.00..435.68 rows=99 width=40) (actual
            time=2.281..28.430 rows=10000 loops=1)
                               Filter: ((tuple)::text =
            'OLD'::text)
                               Rows Removed by Filter: 10000
           Total runtime: 111.603 ms
          (10 rows)
           
          So we get a nested loop in the bad case, instead of a
            hash join.
          But what looks strange to me in this nested loop is that
            the seq scan on the tmp table is executed 10000 times (once
            for each t1_log row) while no row matches the "t1_log.tuple
            = 'OLD'" condition, leading to a dramatic O^2 behaviour.
           
          I have also remarked that the problem disappears when:
          - an index is added on the temporary table,
          - or the log trigger is disabled,
          - or the enable_nestloop is disabled (bringing what is
            currenlty my favourite workaround),
          - or when I delete from pg_statistics the
                row concerning the "tuple" column of the
            log table (that presently says that there is nothing but
            'NEW' values).
           
          Thanks by advance for any explanation about this case.
          Best regards.
          Philippe.

            
Attachment:
test.sql

Description: Binary data
Attachment:
test_ko1.log

Description: Binary data
-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance