Re: using CURSOR with PHP

Keary Suska <hierophant@xxxxxxxxxx> · Sat, 18 May 2002 19:07:28 -0600

on 5/18/02 4:24 PM, andrew@xxxxxxxxxxxxxxx purportedly said:

> PostgreSQL will certainly take LIMIT into account when planning queries.
> 
> EXPLAIN SELECT * FROM mybigtable WHERE x = y;
> 
> and 
> 
> EXPLAIN SELECT * FROM mybigtable WHERE x = y LIMIT n;
> 
> Will regularly show different query plans.

Limit *is* taken into account, and a different query plan may be used
depending on the LIMIT and OFFSET values. However, as you will see below,
the query plans are all identical (aside from the report on LIMIT cost, and
using an index because of sorting). Here is a real example with a table with
32k records:

milewiz=> explain select * from activity;
Seq Scan on activity  (cost=0.00..20.00 rows=1000 width=39)

milewiz=> explain select * from activity limit 200;
Limit  (cost=0.00..4.00 rows=200 width=39)
  ->  Seq Scan on activity  (cost=0.00..20.00 rows=1000 width=39)

Here a sequential scan is used but because we want only the first 200
tuples, getting results is much faster. Identical results would likely
appear from adding a WHERE clause. However, using LIMIT is next to useless
unless the results are ordered, which would be necessary for the application
in question:

milewiz=> explain select * from activity order by date;
Index Scan using act_date_idx on activity  (cost=0.00..59.00 rows=1000
width=39)

milewiz=> explain select * from activity order by date limit 200;
NOTICE:  QUERY PLAN:
Limit  (cost=0.00..11.80 rows=200 width=39)
  ->  Index Scan using act_date_idx on activity  (cost=0.00..59.00 rows=1000
width=39)

Here the cost is much higher, partly because of the sort, and partly because
the more results must be gathered to satisfy the ordering. But only the
first call will LIMIT n--subsequent calls must use offset. Consider:

milewiz=> explain select * from activity order by date limit 200 offset 200;
Limit  (cost=11.80..23.60 rows=200 width=39)
  ->  Index Scan using act_date_idx on activity  (cost=0.00..59.00 rows=1000
width=39)

milewiz=> explain select * from activity order by date limit 200 offset 800;
Limit  (cost=47.20..59.00 rows=200 width=39)
  ->  Index Scan using act_date_idx on activity  (cost=0.00..59.00 rows=1000
width=39)

milewiz=> explain select * from activity order by date limit 200 offset
10000;
Limit  (cost=59.00..59.00 rows=1 width=39)
  ->  Index Scan using act_date_idx on activity  (cost=0.00..59.00 rows=1000
width=39)

The optimizer assumes a complete table return will only return 1000 rows,
which in fact will return 32k rows, so as we progress through calls, at a
certain point the optimizer will execute a plan with the same cost as a full
table scan, regardless of the limit. Frequent VACUUM ANALYZE will help this
somewhat.

And as you can see, the query plans are all identical. The performance will
depend mostly on the differences between field values, which will effect how
many records must be scanned in order to fulfill the LIMIT. At a certain
point, this becomes the same as a full scan. With smaller table results,
this will happen sooner in the game.

> For more complex queries, however, the plans are less likely to differ
> 
Actually, as you can see with my examples above, the reverse is true, as the
simple query produced the same plan. The more complex the query, the more
choices the optimizer has for picking the best case, which unfortunately is
not always the truly best case.
> 
> OTOH PostgreSQL does _not_ do result caching, unless you have applied
> some patches that were floating around last year some time.

Yes, I recalled wrong. Postgres does not cache results, but caches tables.
So subsequent queries on particular tables will improve, as long as there is
enough shared memory to accommodate the tables, or may vary based on
results. If only a partial table can be kept in cache, as long as Postgres
can retrieve just from the cached part, queries will be optimized.

Keary Suska
Esoteritech, Inc.
"Leveraging Open Source for a better Internet"