Re: will the planner ever use an index when the condition is <> ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/17/2011 11:24 AM, Filip Rembiałkowski wrote:
Normally there is no chance it could work,
because (a) the planner does not know all possible values of a column,
and (b) btree indexes cannot search on "not equal" operator.

Is there an index type that can check "not equal"?
This specific column has a limited number of possible values - it is essentially an enumerated list.

Roxanne


BTW I've just made a case where - logically - it could work, but it
still does not:

create table nums ( num int4 not null, check(num=1 or num=2) );
insert into nums select case when random()<=0.99 then 1 else 2 end
from generate_series(1,1000000);
create index nums_idx on nums(num);
analyze nums;
set constraint_exclusion to 'on';
explain select * from nums where num<>1;
--planner could estimate selectivity as 1%, and use index with "=2"
filter basing on check constraint?




2011/12/17 Roxanne Reid-Bennett<rox@xxxxxxxxxxx>:
I have a query that used<>  against an indexed column. In this
case I can use the reverse and use in or = and get the performance
I need... but "in general"... will the planner ever use an index when the
related column is compared using<>?

I feel like the answer is no, but wanted to ask.

Roxanne
Postgres Version 8.4.9 PostGIS version 1.5.2



Context for question:

I have the following query:

select *
from op_region opr, yield_segment_info ysi, data_location dl
where opr.op_region_id in
         (select distinct op_region_id
          from yield_point
          where yield>  0
          and area>  0
          and ST_GeometryType(location)<>  'ST_Point'
         )
and ysi.op_region_id = opr.op_region_id
and dl.data_set_id = opr.data_set_id

Yield_Point has 161,575,599 records
where yield>0 and area>  0 has 161,263,193 records,
where ST_GeometryType(location)<>  'ST_Point' has just 231 records

yield_segment_info has 165,929 records
op_region has 566,212 records
data_location has 394,763

All of these have a high volume of insert/delete's.
The tables have recently been vacuum full'd and the indexes reindexed.
[they are under the management of the autovacuum, but we forced a cleanup on
the chance that things had degraded...]

If I run an explain analyze:

"Nested Loop
    (cost=5068203.00..5068230.31 rows=3 width=225308)
    (actual time=192571.730..193625.728 rows=236 loops=1)"
"->Nested Loop
      (cost=5068203.00..5068219.66 rows=1 width=57329)
      (actual time=192522.573..192786.698 rows=230 loops=1)"
"  ->Nested Loop
        (cost=5068203.00..5068211.36 rows=1 width=57268)
        (actual time=192509.822..192638.446 rows=230 loops=1)"
"    ->HashAggregate
           (cost=5068203.00..5068203.01 rows=1 width=4)
           (actual time=192471.507..192471.682 rows=230 loops=1)"
"       ->Seq Scan on yield_point
              (cost=0.00..5068203.00 rows=1 width=4)
              (actual time=602.174..192471.177 rows=230 loops=1)"
"             Filter: ((yield>  0::double precision) AND
                       (area>  0::double precision) AND
                       (st_geometrytype(location)<>  'ST_Point'::text))"
"    ->Index Scan using op_region_pkey on op_region opr
           (cost=0.00..8.33 rows=1 width=57264)
           (actual time=0.723..0.723 rows=1 loops=230)"
"          Index Cond: (opr.op_region_id = yield_point.op_region_id)"
"  ->Index Scan using yield_segment_info_key on yield_segment_info ysi
       (cost=0.00..8.29 rows=1 width=65)
       (actual time=0.643..0.643 rows=1 loops=230)"
"      Index Cond: (ysi.op_region_id = opr.op_region_id)"
"->Index Scan using data_location_data_set_idx on data_location dl
    (cost=0.00..10.61 rows=3 width=167979)
    (actual time=3.611..3.646 rows=1 loops=230)"
"Index Cond: (dl.data_set_id = opr.data_set_id)"
"Total runtime: 193625.955 ms"

yield_point has the following indexes:
      btree on ST_GeometryType(location)
      gist on location
      btree on op_region_id

I've also tried an index on
      ((yield>  0::double precision) AND (area>  0::double precision) AND
(st_geometrytype(location)<>  'ST_Point'::text))
... it still goes for the sequential scan.

But if I change it to st_geometrytype(location) = 'ST_Polygon' or
even in ('ST_Polygon','ST_MultiPolygon')

the planner uses the index.

Roxanne

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux