Re: Use order by clause, got index scan involved

高健 <luckyjackgao@xxxxxxxxx> · Fri, 9 Nov 2012 09:48:52 +0800

Hi Jeff

Thank you for your reply. 
I will try to learn about effective_cache_size .
Jian gao 

2012/11/9 Jeff Janes <jeff.janes@xxxxxxxxx>

On Wed, Nov 7, 2012 at 11:41 PM, 高健 <luckyjackgao@xxxxxxxxx> wrote:

Hi all:

What confused me is that:  When I
select data using order by  clause, I got
the following execution plan:

postgres=# set session enable_indexscan=true;                                                                                               

SET                                                                                                                                         

postgres=# explain SELECT * FROM pg_proc ORDER BY oid;                                                                                          

QUERY PLAN                                                                                                                               

----------------------------------------------------------------------------------------                                                                                             

 Index Scan using
pg_proc_oid_index on pg_proc 
(cost=0.00..321.60 rows=2490 width=552)                                                 

You should probably use sample cases much larger than this when trying to understand the planner.  With queries this small, it almost doesn't matter what plan is chosen.

(1 row)                                                                                                                                 

postgres=#

My Question is : 

 If I want to find record using the where clause which hold the id column, 
the index scan might be used.

But  I just want to get all the  records on sorted output format,  Why index scan can be used here?

I can’t imagine  that:

Step 1 Index is read into memory, then for each tuple in it, 

Step 2 Then we got  the address of  related data block, and then access the data
block .

Step 2 will be repeated for many times. I think it is not efficient.

But step 2 will repeatedly find the block it is visiting to already be in memory, so it is efficient.

Maybe the database system is clever enough to  accumulate data access for same physical
page, and  reduce the times of physical
page acess ?

There is a bitmap scan which does that, but such a scan can't be used to fulfill a sort, because it doesn't return the rows in index order.  What reduces the cost here is the various levels of caching implemented by the file system, the memory system, and the CPU.  PG uses "effective_cache_size" to try to account for these effects, although I admit I don't quite understand what exactly it is doing in this case.  I thought that setting effective_cache_size to absurdly low values would make the index scan cost estimate go up a lot, but it only made it go up a little.

Cheers,

Jeff