Re: Query about index usage

Greg Smith <greg@xxxxxxxxxxxxxxx> · Wed, 23 Jun 2010 02:05:44 -0400



Jayadevan M wrote:
It is mentioned that table data blocks have data about tuple visibility and hence table scans 
are always necessary. So how does PostgreSQL reduce the number of blocks 
to be read by using indexes?
To be useful, a query utilizing an index must be selective:  it must 
only return a fraction of the possible rows in the table.  Scanning the 
index will produce a list of blocks that contain the potentially visible 
data, then only those data blocks will be retrieved and tested for 
visibility.
Let's say you have a table that's 100 pages (pages are 8KB) and an index 
that's 50 pages against it.  You run a query that only selects 5% of the 
rows in the table, from a continuous section.  Very rough estimate, it 
will look at 5% * 50 = 3 index pages.  Those will point to a matching 
set of 5% * 100 = 5 data pages.  Now you've just found the right subset 
of the data by only retrieving 8 random pages of data instead of 100.  
With random_page_cost=4.0, that would give this plan a cost of around 
32, while the sequential scan one would cost 100 * 1.0 (sequential 
accesses) for a cost of around 100 (Both of them would also have some 
smaller row processing cost added in there too).
It's actually a bit more complicated than that--the way indexes are 
built means you can't just linearly estimate their usage, and scans of 
non-contiguous sections are harder to model simply--but that should give 
you an idea.  Only when using the index significantly narrows the number 
of data pages expected will it be an improvement over ignoring the index 
and just scanning the whole table.
If the expected use of the index was only 20% selective for another 
query, you'd be getting 20% * 50 = 10 index pages, 20% * 100 = 20 data 
pages, for a potential total of 30 random page lookups.  That could end 
up costing 30 * 4.0 = 120, higher than the sequential scan.    Usually 
the breakpoint for how much of a table has to be scanned before just 
scanning the whole thing sequentially is considered cheaper happens near 
20% of it, and you can shift it around by adjusting random_page_cost.  
Make it lower, and you can end up preferring index scans even for 30 or 
40% of a table.
Do index data get updated as and when data is committed and made 'visible' or is it 
that index data get updated as soon as data is changed, before commit is 
issued and rollback of transaction results in a rollback of the index data
Index changes happen when the data goes into the table, including 
situations where it might not be committed.  The index change doesn't 
ever get deferred to commit time, like you can things like foreign key 
checks.  When a transaction is rolled back, the aborted row eventually 
gets marked as dead by vacuum, at which point any index records pointing 
to it can also be cleaned up.
--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@xxxxxxxxxxxxxxx   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance