Re: Btree indexes, large numbers and <= comparisons

Toke Høiland-Jørgensen <toke@xxxxxxx> · Thu, 29 Mar 2007 15:33:15 +0200

On Thursday 29 March 2007 13:33, Alejandro D. Burne wrote:
> 2007/3/29, Toke Høiland-Jørgensen <toke@xxxxxxx>:
> > I have a table with ~5 million rows containing ranges of large (8-digit)
> > numbers. The table has an int4 field for the range start and the range
> > end, and a field which is null if that particular range is expired, and
> > has a value otherwise.
> >
> > I need to query this table to find a range containing a particular
> > number, e.g. a query might look like this:
> >
> > SELECT * FROM table_name WHERE range_start <= 87654321 AND range_end >=
> > 87654321 AND expired IS NULL
> >
> > My problem is that when I run a query like the above, the query planner
> > does a sequential scan, even though i have an index on both the query
> > columns separately, as well as an index containing both columns. The
> > indexes are defined like this:
> >
> > CREATE INDEX range_start_end_index ON table_name USING btree
> > (range_start, range_end) WHERE expired IS NULL
> > CREATE INDEX range_start_index ON table_name USING btree (range_start)
> > WHERE expired IS NULL
> > CREATE INDEX range_end_index ON table_name USING btree (range_end) WHERE
> > expired IS NULL
> >
> > When I do a query for smaller numbers (7-digit and below, as far as I can
> > see), the query planner uses the index(es) and the query is
> > instantaneous. However, when I run a query like the above, the planner
> > decides to do a sequential scan of the entire table.
> >
> > I realize this probably has something to do with the planner only
> > searching for the first part of the WHERE clause (i.e. range_start <=
> > 87654321) and deciding that this will probably yield so many rows that a
> > sequential scan will yield results that are just as good. However, the
> > data is structured in such a way that multiple ranges containing the same
> > number (and which are not expired) do not exist. So in reality there will
> > be either 1 or 0 results for a query like the above.
> >
> > How do I make the query planner realize that using the index is a Good
> > Thing(tm)?
> >
> > Any help will be greatly appreciated.
> >
> > Regards,
> > -Toke
>
> Can you send an explain analyze for that query?
>
> Alejandro
>

Sure:

=> EXPLAIN ANALYSE SELECT * FROM table_name WHERE range_start <= 87654321 AND 
range_end >= 87654321 AND expired IS NULL;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Seq Scan on table_name  (cost=0.00..137046.73 rows=161972 width=109) (actual 
time=3894.471..3894.471 rows=0 loops=1)
   Filter: ((range_start <= 87654321) AND (range_end >= 87654321) AND (expired 
IS NULL))
 Total runtime: 3894.531 ms
(3 rows)

The same thing with a smaller number:

=> EXPLAIN ANALYSE SELECT * FROM table_name WHERE range_start <= 8765432 AND 
range_end >= 8765432 AND expired IS NULL;
                                                             QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using range_start_index on table_name  (cost=0.00..800.61 rows=200 
width=109) (actual time=26.839..26.839 rows=0 loops=1)
   Index Cond: (range_start <= 8765432)
   Filter: ((range_end >= 8765432) AND (expired IS NULL))
 Total runtime: 26.899 ms
(4 rows)

Thanks for your prompt reply :)

-Toke