Parallel Append Query

Luís Roberto Weck <luisroberto@xxxxxxxxxxxxxxx> · Fri, 21 Feb 2020 08:16:29 -0300



    Hi!

    
    I'm seeing varying number of rows removed by filter on a parallel
    append query:

    
    CREATE TABLE usuario (

      id bigint generated always as identity,

      nome text

    );

    
    INSERT INTO usuario (nome)

    SELECT md5(seq)::text

        FROM generate_series(1,500000,1) tab(seq);

    
    ANALYZE usuario;

    
    EXPLAIN ANALYZE 

    (SELECT * FROM usuario WHERE nome =
      'c4ca4238a0b923820dcc509a6f75849b')

    UNION ALL

    (SELECT * FROM usuario WHERE nome =
      'c81e728d9d4c2f636f067f89cc14862c')

    
    Gather  (cost=1000.00..15574.56 rows=2 width=41) (actual
      time=0.320..56.154 rows=2 loops=1)

      Workers Planned: 2

      Workers Launched: 2

      ->  Parallel Append  (cost=0.00..14574.36 rows=2
      width=41) (actual time=13.449..47.899 rows=1 loops=3)

            ->  Parallel Seq Scan on usuario 
      (cost=0.00..7287.17 rows=1 width=41) (actual time=8.934..25.476
      rows=0 loops=3)

                  Filter: (nome =
      'c81e728d9d4c2f636f067f89cc14862c'::text)

                  Rows Removed by Filter: 166666

            ->  Parallel Seq Scan on usuario usuario_1 
      (cost=0.00..7287.17 rows=1 width=41) (actual time=13.470..33.630
      rows=0 loops=2)

                  Filter: (nome =
      'c4ca4238a0b923820dcc509a6f75849b'::text)

                  Rows Removed by Filter: 250000

    Planning Time: 0.106 ms

    Execution Time: 56.184 ms

    
    The execution plan varies just by running the query a few
    times, and it goes back and forth between these two:

      
    Gather  (cost=1000.00..15574.56 rows=2 width=41) (actual
      time=0.341..55.499 rows=2 loops=1)

      Workers Planned: 2

      Workers Launched: 2

      ->  Parallel Append  (cost=0.00..14574.36 rows=2
      width=41) (actual time=15.904..49.821 rows=1 loops=3)

            ->  Parallel Seq Scan on usuario 
      (cost=0.00..7287.17 rows=1 width=41) (actual time=0.013..48.591
      rows=1 loops=1)

                  Filter: (nome =
      'c81e728d9d4c2f636f067f89cc14862c'::text)

                  Rows Removed by Filter: 499999

            ->  Parallel Seq Scan on usuario usuario_1 
      (cost=0.00..7287.17 rows=1 width=41) (actual time=15.985..33.621
      rows=0 loops=3)

                  Filter: (nome =
      'c4ca4238a0b923820dcc509a6f75849b'::text)

                  Rows Removed by Filter: 166666

    Planning Time: 0.107 ms

    Execution Time: 55.524 ms

      
    But I can't understand why does it changes the number of loops
    from the queries