Re: Unexplainable execution time difference between two test functions...one using IF (SELECT COUNT(*) FROM...) and the other using IF EXISTS (SELECT 1 FROM...)

Hackety Man <hacketyman@xxxxxxxxx> · Tue, 17 Apr 2018 10:01:14 -0400

On Tue, Apr 17, 2018 at 6:49 AM, Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxx> wrote:

On 04/16/2018 10:42 PM, Hackety Man wrote:

...

    The first thing I did was to run some baseline tests using the basic

    queries inside of the IF() checks found in each of the functions to

    see how the query planner handled them.  I ran the following two

    queries.

        EXPLAIN (ANALYZE, BUFFERS) SELECT COUNT(*) FROM zz_noidx1 WHERE

        LOWER(text_distinct) = LOWER('Test5000001');

        EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE

        LOWER(text_distinct) = LOWER('Test5000001');

Those are not the interesting plans, though. The EXISTS only cares about the first row, so you should be looking at

    EXPLAIN (ANALYZE, BUFFERS) SELECT 1 FROM zz_noidx1 WHERE

    LOWER(text_distinct) = LOWER('Test5000001') LIMIT 1;

Okay.  I tested this query and it did return an execution time on par with my tests of the "zz_spx_ifexists_noidx" function.

    I moved on to test the other function with the following query...

        EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM

        zz_spx_ifcount_noidx('Test5000001');

    and I got the following "auto_explain" results...

        2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 426.279 ms         plan:

          Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE

        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))

          Result  (cost=4.08..4.09 rows=1 width=1) (actual

        time=426.274..426.274 rows=1 loops=1)

            Buffers: shared read=5406

            InitPlan 1 (returns $0)

           ->  Seq Scan on zz_noidx1  (cost=0.00..20406.00 rows=5000

        width=0) (actual time=426.273..426.273 rows=0 loops=1)

              Filter: (lower(text_distinct) = 'test5000001'::text)

              Rows Removed by Filter: 1000000

              Buffers: shared read=5406

        2018-04-16 14:58:34.134 EDT [12616] CONTEXT:  SQL statement

        "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE

        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"

          PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF

        2018-04-16 14:58:34.134 EDT [12616] LOG:  duration: 428.077 ms         plan:

          Query Text: explain (analyze, buffers) select * from

        zz_spx_ifexists_noidx('Test5000001')

          Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26

        rows=1 width=32) (actual time=428.076..428.076 rows=1 loops=1)

            Buffers: shared hit=30 read=5438

    Definitely not the execution time, or query plan, results I was

    expecting.  As we can see, no workers were employed here and my

    guess was that this was the reason or the large execution time

    difference between these 2 tests?  199 milliseconds versus 428

    milliseconds, which is a big difference.  Why are workers not being

    employed here like they were when I tested the query found inside of

    the IF() check in a standalone manner?  But then I ran another test

    and the results made even less sense to me.

The plan difference is due to not realizing the EXISTS essentially implies LIMIT 1. Secondly, it expects about 5000 rows matching the condition,  uniformly spread through the table. But it apparently takes much longer to find the first one, hence the increased duration.

Ah.  I did not know that.  So EXISTS inherently applies a LIMIT 1, even though it doesn't show in the query plan, correct?  Is it not possible for parallel scans to be implemented while applying an implicit, or explicit, LIMIT 1?

How did you generate the data?

I used generate_series() to create 1 million records in sequence at the same time that I created the table using the following script...
CREATE TABLE zz_noidx1 AS SELECT generate_series(0, 999999) AS int_distinct, 'Test'::text || generate_series(0, 999999)::text AS text_distinct; 

    When I ran the above query the first 5 times after starting my

    Postgres service, I got the same results each time (around 428

    milliseconds), but when running the query 6 or more times, the

    execution time jumps up to almost double that.  Here are the

    "auto_explain" results running this query a 6th time...

This is likely due to generating a generic plan after the fifth execution. There seems to be only small difference in costs, though.

        --"auto_explain" results after running the same query 6 or more

        times.

        2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 761.847 ms         plan:

          Query Text: SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE

        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))

          Result  (cost=4.58..4.59 rows=1 width=1) (actual

        time=761.843..761.843 rows=1 loops=1)

            Buffers: shared hit=160 read=5246

            InitPlan 1 (returns $0)

           ->  Seq Scan on zz_noidx1  (cost=0.00..22906.00 rows=5000

        width=0) (actual time=761.841..761.841 rows=0 loops=1)

              Filter: (lower(text_distinct) = lower($1))

              Rows Removed by Filter: 1000000

              Buffers: shared hit=160 read=5246

        2018-04-16 15:01:51.635 EDT [12616] CONTEXT:  SQL statement

        "SELECT EXISTS (SELECT 1 FROM zz_noidx1 WHERE

        LOWER(zz_noidx1.text_distinct) = LOWER(p_findme))"

          PL/pgSQL function zz_spx_ifexists_noidx(text) line 4 at IF

        2018-04-16 15:01:51.635 EDT [12616] LOG:  duration: 762.156 ms         plan:

          Query Text: explain (analyze, buffers) select * from

        zz_spx_ifexists_noidx('Test5000001')

          Function Scan on zz_spx_ifexists_noidx  (cost=0.25..0.26

        rows=1 width=32) (actual time=762.154..762.155 rows=1 loops=1)

            Buffers: shared hit=160 read=5246

    As you can see, the execution time jumps up to about 762

    milliseonds.  I can see in the sequence scan node that the LOWER()

    function shows up on the right side of the equal operator, whereas

    in the first 5 runs of this test query the plan did not show this.     Why is this?

It doesn't really matter on which side it shows, it's more about a generic plan built without knowledge of the parameter value.

Right.  I was more wondering why it switched over to a generic plan, as you've stated, like clockwork starting with the 6th execution run.

regards

-- 

Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services