Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> · Tue, 17 Apr 2018 16:23:15 +0200

On 04/17/2018 04:01 PM, Hackety Man wrote:

On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra 
<tomas.vondra@xxxxxxxxxxxxxxx <mailto:tomas.vondra@xxxxxxxxxxxxxxx>> wrote:

    On 04/16/2018 10:42 PM, Hackety Man wrote:

        ...
             The first thing I did was to run some baseline tests using
        the basic
             queries inside of the IF() checks found in each of the
        functions to
             see how the query planner handled them.  I ran the
        following two
             queries.

                 EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM
        zz_noidx1 WHERE
                 LOWER(text_distinct) = LOWER('Test5000001');
                 EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(text_distinct) = LOWER('Test5000001');

    Those are not the interesting plans, though. The EXISTS only cares
    about the first row, so you should be looking at

         EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE
         LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;

Okay.  I tested this query and it did return an execution time on par 
with my tests of the "zz_spx_ifexists_noidx" function.
*
*

             I moved on to test the other function with the following
        query...

                 EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM
                 zz_spx_ifcount_noidx('Test5000001');

             and I got the following "auto_explain" results...

                 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
        426.279 ms         plan:
                   Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
                   Result  (cost=4.08..4.09 rows=1 width=1) (actual
                 time=426.274..426.274 rows=1 loops=1)
                     Buffers: shared read=5406
                     InitPlan 1 (returns $0)
                    ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00
        rows=5000
                 width=0) (actual time=426.273..426.273 rows=0 loops=1)
                       Filter: (lower(text_distinct) = 'test5000001'::text)
                       Rows Removed by Filter: 1000000
                       Buffers: shared read=5406
                 2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement
                 "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
                   PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
        at IF
                 2018-04-16 14:58:34.134 EDT [12616] LOG:  duration:
        428.077 ms         plan:
                   Query Text: explain (analyze, buffers) select * from
                 zz_spx_ifexists_noidx('Test5000001')
                   Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
                 rows=1 width=32) (actual time=428.076..428.076 rows=1
        loops=1)
                     Buffers: shared hit=30 read=5438

             Definitely not the execution time, or query plan, results I was
             expecting.  As we can see, no workers were employed here and my
             guess was that this was the reason or the large execution time
             difference between these 2 tests?  199 milliseconds versus 428
             milliseconds, which is a big difference.  Why are workers
        not being
             employed here like they were when I tested the query found
        inside of
             the IF() check in a standalone manner?  But then I ran
        another test
             and the results made even less sense to me.

    The plan difference is due to not realizing the EXISTS essentially
    implies LIMIT 1. Secondly, it expects about 5000 rows matching the
    condition,  uniformly spread through the table. But it apparently
    takes much longer to find the first one, hence the increased duration.

Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even 
though it doesn't show in the query plan, correct? Is it not possible 
for parallel scans to be implemented while applying an implicit, or 
explicit, LIMIT 1?
**//___^

It doesn't add a limit node to the plan, but it behaves similarly to 
that. The database only needs to fetch the first row to answer the 
EXISTS predicate.

I don't think this is inherently incompatible with parallel plans, but 
the planner simply thinks it's going to bee very cheap - cheaper than 
setting up parallel workers etc. So it does not do that.

    How did you generate the data?

I used generate_series() to create 1 million records in sequence at the 
same time that I created the table using the following script...

    CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS
    int_distinct, 'Test'::text || generate_series(0, 999999)::text AS
    text_distinct;

Which means that there are actually no matching rows for 'Test5000001'. 
So the database will scan the whole table anyway, in order to answer the 
EXISTS condition. The estimate of 5000 matching rows is a default value 
(0.5% out of 1M rows), because the value is entirely out of the data 
range covered by the histogram.

The easiest solution probably is adding an index on that column, which 
will make answering the EXISTS much faster (at least in this case).

             When I ran the above query the first 5 times after starting my
             Postgres service, I got the same results each time (around 428
             milliseconds), but when running the query 6 or more times, the
             execution time jumps up to almost double that.  Here are the
             "auto_explain" results running this query a 6th time...

    This is likely due to generating a generic plan after the fifth
    execution. There seems to be only small difference in costs, though.

                 --"auto_explain" results after running the same query 6
        or more
                 times.
                 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration:
        761.847 ms         plan:
                   Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))
                   Result  (cost=4.58..4.59 rows=1 width=1) (actual
                 time=761.843..761.843 rows=1 loops=1)
                     Buffers: shared hit=160 read=5246
                     InitPlan 1 (returns $0)
                    ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00
        rows=5000
                 width=0) (actual time=761.841..761.841 rows=0 loops=1)
                       Filter: (lower(text_distinct) = lower($1))
                       Rows Removed by Filter: 1000000
                       Buffers: shared hit=160 read=5246
                 2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement
                 "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE
                 LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"
                   PL/pgSQL function zz_spx_ifexists_noidx(text) line 4
        at IF
                 2018-04-16 15:01:51.635 EDT [12616] LOG:  duration:
        762.156 ms         plan:
                   Query Text: explain (analyze, buffers) select * from
                 zz_spx_ifexists_noidx('Test5000001')
                   Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26
                 rows=1 width=32) (actual time=762.154..762.155 rows=1
        loops=1)
                     Buffers: shared hit=160 read=5246

             As you can see, the execution time jumps up to about 762
             milliseonds.  I can see in the sequence scan node that the
        LOWER()
             function shows up on the right side of the equal operator,
        whereas
             in the first 5 runs of this test query the plan did not
        show this.     Why is this?

    It doesn't really matter on which side it shows, it's more about a
    generic plan built without knowledge of the parameter value.

Right.  I was more wondering why it switched over to a generic plan, as 
you've stated, like clockwork starting with the 6th execution run.

That's a hard-coded value. The first 5 executions are re-planned using 
the actual parameter values, and then we try generating a generic plan 
and see if it's cheaper than the non-generic one. You can disable that, 
though.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services