Re: "NOT IN" substantially slower in 9.0.2 than 8.3.13 - NOT EXISTS runs fast in both 8.3.13 and 9.0.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Achilleas Mantzios wrote:
Thanx,

ÎÏÎÏ Monday 17 January 2011 18:52:27 Î/Î Ing. Marcos Ortiz Valmaseda ÎÎÏÎÏÎ:
Well, on the Release Notes on the PostgreSQL-8.4 Documentation, the developers recommend to use NOT EXISTS
instead NOT IN, because the first clause has a better performance. So, you can use it on that way.

You mean this?
(from 8.4 changes)
"Create explicit concepts of semi-joins and anti-joins (Tom)
       This work formalizes our previous ad-hoc treatment of IN (SELECT
       ...) clauses, and extends it to EXISTS and NOT EXISTS clauses. It
       should result in significantly better planning of EXISTS and NOT
       EXISTS queries. In general, logically equivalent IN and EXISTS
       clauses should now have similar performance, whereas previously IN
       often won."

I haven't found any other recent reference to this issue. And this is far from what you suggest.
Here the entry talks about "similar" performance.

Also a similar issue was hot back in 7.4 days :
"IN / NOT IN subqueries are now much more efficient
          In previous releases, IN/NOT IN subqueries were joined to the
          upper query by sequentially scanning the subquery looking for a
          match. The 7.4 code uses the same sophisticated techniques used
          by ordinary joins and so is much faster. An IN will now usually
          be as fast as or faster than an equivalent EXISTS subquery; this
          reverses the conventional wisdom that applied to previous
          releases."

Other questions?
- Do you have a partial index on marinerstates.marinerid where this condition is accomplished?
No, but i just tried it (on state='Active') with no impact.
- Do you have a index on mariner.id?
Yes, It is the primary key.
- Can you provide a explain of these queries on the PostgreSQL-9.0 machines?
Sure, first i'll post the table definitions and then some stats and then the epxlain analyze(s)

mariner
=====
 id                           | integer                  | not null default nextval(('public.mariner_id_seq'::text)::regclass)
 givenname                    | character varying(200)   |
 midname                      | character varying(100)   |
 surname                      | character varying(200)   | not null
...
Indexes:
    "mariner_pkey" PRIMARY KEY, btree (id)
    "mariner_smauid" UNIQUE, btree (smauid)
    "mariner_username_key" UNIQUE, btree (username)
    "mariner_nationalityid" btree (nationalityid)
    "mariner_parentid" btree (parentid)
    "mariner_surname" btree (surname)

marinerstates
========
 id                     | integer                  | not null default nextval(('public.marinerstates_id_seq'::text)::regclass)
 marinerid              | integer                  | not null
 state                  | character varying(20)    | not null
 vslid                  | integer                  |
 leave_period_days      | integer                  |
 comment                | text                     |
 starttime              | timestamp with time zone | not null
 endtime                | timestamp with time zone |
 trid                   | integer                  |
 sal_bw                 | real                     | not null default 0.0
 sal_ot                 | real                     | not null default 0.0
 sal_lp                 | real                     | not null default 0.0
 sal_misc               | real                     | not null default 0.0
 rankid                 | integer                  |
 system_vslid           | integer                  |
 startport              | text                     |
 endport                | text                     |
.....
Indexes:
    "marinerstates_pkey" PRIMARY KEY, btree (id)
    "marinerstates_mariner_cur_state" UNIQUE, btree (marinerid) WHERE endtime IS NULL
    "marinerstates_system_vslid" UNIQUE, btree (marinerid, system_vslid)
    "marinerstates__system_vslid" btree (system_vslid)
    "marinerstates_cur_mariners_states" btree (endtime) WHERE endtime IS NULL
    "marinerstates_mariner_past_state" btree (marinerid, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_marinerid" btree (marinerid)
    "marinerstates_marinerid_starttime" btree (marinerid, starttime)
    "marinerstates_rankid" btree (rankid)
    "marinerstates_rankid_cur_mariners" btree (rankid) WHERE endtime IS NULL
    "marinerstates_rankid_past_state" btree (rankid, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_state" btree (state)
    "marinerstates_state_cur_mariners" btree (state) WHERE endtime IS NULL
    "marinerstates_state_past_state" btree (state, starttime, endtime) WHERE endtime IS NOT NULL
    "marinerstates_vslid" btree (vslid)
    "marinerstates_vslid_cur_mariners" btree (vslid) WHERE endtime IS NULL
    "marinerstates_vslid_past_state" btree (vslid, starttime, endtime) WHERE endtime IS NOT NULL

vessels
=====
 name               | character varying(200) | not null
 id                 | integer                | not null default nextval(('public.vessels_id_seq'::text)::regclass)
...
Indexes:
    "vessels_pkey" PRIMARY KEY, btree (id)
    "vessels_name_key" UNIQUE, btree (name)
    "idx_name" btree (name)
    "vessels_flag" btree (flag)
    "vessels_groupno" btree (groupno)
    "vessels_vslstatus_idx" btree (vslstatus)

dynacom=# SELECT count(*) from mariner;
 count
-------
 14447

dynacom=# SELECT count(*) from marinerstates;
 count
-------
 51013

dynacom=# SELECT avg(marqry.cnt),stddev(marqry.cnt) FROM (SELECT m.id,count(ms.id) as cnt from mariner m, marinerstates ms WHERE m.id=ms.marinerid group by m.id) AS marqry;
        avg         |       stddev
--------------------+--------------------
 3.5665944207508914 | 4.4416879361829170

(vessels do not play any impact in the query, so i'll leave them out)

Slow plan in 9.0.2 :
                                                                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=11525.09..11571.55 rows=3717 width=23) (actual time=10462.561..10462.937 rows=603 loops=1)
   ->  Sort  (cost=11525.09..11534.38 rows=3717 width=23) (actual time=10462.560..10462.664 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id
         Sort Method:  quicksort  Memory: 71kB
         ->  Hash Join  (cost=8281.98..11304.67 rows=3717 width=23) (actual time=10425.261..10461.621 rows=603 loops=1)
               Hash Cond: (ms.marinerid = m.id)
               ->  Hash Join  (cost=20.12..2963.83 rows=3717 width=4) (actual time=0.228..34.993 rows=2625 loops=1)
                     Hash Cond: (ms.vslid = vsl.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2889.32 rows=4590 width=8) (actual time=0.011..33.494 rows=2625 loops=1)
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                     ->  Hash  (cost=16.72..16.72 rows=272 width=4) (actual time=0.207..0.207 rows=272 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 10kB
                           ->  Seq Scan on vessels vsl  (cost=0.00..16.72 rows=272 width=4) (actual time=0.004..0.118 rows=272 loops=1)
               ->  Hash  (cost=8172.57..8172.57 rows=7143 width=23) (actual time=10424.994..10424.994 rows=12832 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 702kB
                     ->  Seq Scan on mariner m  (cost=7614.86..8172.57 rows=7143 width=23) (actual time=10409.498..10419.971 rows=12832 loops=1)
                           Filter: ((NOT (hashed SubPlan 1)) AND ((marinertype)::text = 'Mariner'::text))
                           SubPlan 1
                             ->  Unique  (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.495..10408.446 rows=1454 loops=1)
                                   ->  Nested Loop  (cost=2768.00..7614.86 rows=1 width=4) (actual time=87.493..10407.517 rows=1835 loops=1)
                                         Join Filter: (msold.marinerid = mold.id)
                                         ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..1728.60 rows=14286 width=4) (actual time=0.007..13.931 rows=14286 loops=1)
                                               Filter: ((marinertype)::text = 'Mariner'::text)
                                         ->  Materialize  (cost=2768.00..5671.97 rows=1 width=8) (actual time=0.003..0.330 rows=1876 loops=14286)
                                               ->  Nested Loop  (cost=2768.00..5671.96 rows=1 width=8) (actual time=39.723..85.401 rows=1876 loops=1)
                                                     ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.708..81.501 rows=1876 loops=1)
                                                           Hash Cond: (msold.marinerid = msold2.marinerid)
                                                           Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval))
                                                           ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual time=0.003..33.952 rows=2625 loops=1)
                                                                 Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                                                           ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.613..39.613 rows=41250 loops=1)
                                                                 Buckets: 8192  Batches: 1  Memory Usage: 2246kB
                                                                 ->  Seq Scan on marinerstates msold2  (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..24.882 rows=41250 loops=1)
                                                                       Filter: ((state)::text = 'Active'::text)
                                                     ->  Index Scan using vessels_pkey on vessels vslold  (cost=0.00..0.28 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1876)
                                                           Index Cond: (vslold.id = msold.vslid)
 Total runtime: 10463.619 ms
(37 rows)

Fast plan in 8.3.13 :
                                                                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=633677.56..633700.48 rows=1834 width=23) (actual time=543.684..551.003 rows=603 loops=1)
   ->  Sort  (cost=633677.56..633682.14 rows=1834 width=23) (actual time=543.676..546.070 rows=603 loops=1)
         Sort Key: m.surname, (COALESCE(m.givenname, ''::character varying)), (COALESCE(m.midname, ''::character varying)), m.id
         Sort Method:  quicksort  Memory: 53kB
         ->  Hash Join  (cost=630601.65..633578.15 rows=1834 width=23) (actual time=439.969..540.573 rows=603 loops=1)
               Hash Cond: (ms.vslid = vsl.id)
               ->  Hash Join  (cost=630580.33..633530.01 rows=2261 width=27) (actual time=437.459..532.847 rows=603 loops=1)
                     Hash Cond: (ms.marinerid = m.id)
                     ->  Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1)
                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                     ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832 loops=1)
                           ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103 width=23) (actual time=311.023..380.168 rows=12832 loops=1)
                                 Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                 SubPlan
                                   ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454 loops=1)
                                         ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961 rows=1835 loops=1)
                                               ->  Nested Loop  (cost=0.00..627027.98 rows=1865 width=4) (actual time=0.099..237.128 rows=1876 loops=1)
                                                     ->  Index Scan using marinerstates_marinerid on marinerstates msold  (cost=0.00..626316.07 rows=2299 width=8) (actual time=0.079..186.150 rows=1876 loops=1)
                                                           Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date) AND (subplan))
                                                           SubPlan
                                                             ->  Bitmap Heap Scan on marinerstates msold2  (cost=4.28..12.11 rows=1 width=0) (actual time=0.020..0.020 rows=1 loops=2625)
                                                                   Recheck Cond: ((marinerid = $0) AND (starttime < $2))
                                                                   Filter: ((id <> $1) AND ((state)::text = 'Active'::text) AND (($2 - endtime) <= '1 year 6 mons'::interval))
                                                                   ->  Bitmap Index Scan on marinerstates_marinerid_starttime  (cost=0.00..4.28 rows=2 width=0) (actual time=0.009..0.009 rows=6 loops=2625)
                                                                         Index Cond: ((marinerid = $0) AND (starttime < $2))
                                                     ->  Index Scan using vessels_pkey on vessels vslold  (cost=0.00..0.30 rows=1 width=4) (actual time=0.006..0.010 rows=1 loops=1876)
                                                           Index Cond: (vslold.id = msold.vslid)
                                               ->  Index Scan using mariner_pkey on mariner mold  (cost=0.00..0.92 rows=1 width=4) (actual time=0.007..0.012 rows=1 loops=1876)
                                                     Index Cond: (mold.id = msold.marinerid)
                                                     Filter: ((mold.marinertype)::text = 'Mariner'::text)
               ->  Hash  (cost=17.81..17.81 rows=281 width=4) (actual time=2.491..2.491 rows=273 loops=1)
                     ->  Seq Scan on vessels vsl  (cost=0.00..17.81 rows=281 width=4) (actual time=0.012..1.306 rows=273 loops=1)
 Total runtime: 553.601 ms
(33 rows)

Is there any other data i could post (pg_stat,...) that would help?

thanx a lot.

Regards


Ing. Marcos LuÃs OrtÃz Valmaseda
Linux User # 418229 && PostgreSQL DBA
Centro de TecnologÃas GestiÃn de Datos (DATEC)
http://postgresql.uci.cu
http://www.postgresql.org
http://it.toolbox.com/blogs/sql-apprentice




--
Achilleas Mantzios

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Achilleas, here is the slow part from 9.02:

                                                    ->  Hash Semi Join  (cost=2768.00..5671.67 rows=1 width=12) (actual time=39.708..81.501 rows=1876 loops=1)
                                                          Hash Cond: (msold.marinerid = msold2.marinerid)
                                                          Join Filter: ((msold2.id <> msold.id) AND (msold2.starttime < msold.starttime) AND ((msold.starttime - msold2.endtime) <= '1 year 6 mons'::interval))
                                                          ->  Seq Scan on marinerstates msold  (cost=0.00..2889.32 rows=4590 width=20) (actual time=0.003..33.952 rows=2625 loops=1)
                                                                Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                                                          ->  Hash  (cost=2251.66..2251.66 rows=41307 width=24) (actual time=39.613..39.613 rows=41250 loops=1)
                                                                Buckets: 8192  Batches: 1  Memory Usage: 2246kB
-> Seq Scan on marinerstates msold2 (cost=0.00..2251.66 rows=41307 width=24) (actual time=0.002..24.882

The same part from 8.3.3 looks like this:

Seq Scan on marinerstates ms  (cost=0.00..2875.32 rows=4599 width=8) (actual time=0.017..80.153 rows=2625 loops=1)
                          Filter: (((state)::text = 'Active'::text) AND ((starttime)::date <= '2007-01-11'::date) AND ((COALESCE(endtime, now()))::date >= '2006-07-15'::date))
                    ->  Hash  (cost=630491.54..630491.54 rows=7103 width=23) (actual time=437.307..437.307 rows=12832 loops=1)
                          ->  Index Scan using mariner_pkey on mariner m  (cost=628776.89..630491.54 rows=7103 width=23) (actual time=311.023..380.168 rows=12832 loops=1)
                                Filter: ((NOT (hashed subplan)) AND ((marinertype)::text = 'Mariner'::text))
                                SubPlan
                                  ->  Unique  (cost=0.00..628772.30 rows=1834 width=4) (actual time=0.129..303.981 rows=1454 loops=1)
                                        ->  Nested Loop  (cost=0.00..628767.72 rows=1834 width=4) (actual time=0.120..289.961 rows=1835 loops=1)


This leads me to the conclusion that the queries differ significantly. 8.3.3 mentions NOT hashed plan, I don't see it in 9.02 and the filtering conditions look differently. Are you sure that the plans are from the same query?

--
Mladen Gogala Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux