Jayadevan M wrote:
It is mentioned that table data blocks have data about tuple visibility and hence table scans are always necessary. So how does PostgreSQL reduce the number of blocks to be read by using indexes?
To be useful, a query utilizing an index must be selective: it must only return a fraction of the possible rows in the table. Scanning the index will produce a list of blocks that contain the potentially visible data, then only those data blocks will be retrieved and tested for visibility.
Let's say you have a table that's 100 pages (pages are 8KB) and an index that's 50 pages against it. You run a query that only selects 5% of the rows in the table, from a continuous section. Very rough estimate, it will look at 5% * 50 = 3 index pages. Those will point to a matching set of 5% * 100 = 5 data pages. Now you've just found the right subset of the data by only retrieving 8 random pages of data instead of 100. With random_page_cost=4.0, that would give this plan a cost of around 32, while the sequential scan one would cost 100 * 1.0 (sequential accesses) for a cost of around 100 (Both of them would also have some smaller row processing cost added in there too).
It's actually a bit more complicated than that--the way indexes are built means you can't just linearly estimate their usage, and scans of non-contiguous sections are harder to model simply--but that should give you an idea. Only when using the index significantly narrows the number of data pages expected will it be an improvement over ignoring the index and just scanning the whole table.
If the expected use of the index was only 20% selective for another query, you'd be getting 20% * 50 = 10 index pages, 20% * 100 = 20 data pages, for a potential total of 30 random page lookups. That could end up costing 30 * 4.0 = 120, higher than the sequential scan. Usually the breakpoint for how much of a table has to be scanned before just scanning the whole thing sequentially is considered cheaper happens near 20% of it, and you can shift it around by adjusting random_page_cost. Make it lower, and you can end up preferring index scans even for 30 or 40% of a table.
Do index data get updated as and when data is committed and made 'visible' or is it that index data get updated as soon as data is changed, before commit is issued and rollback of transaction results in a rollback of the index data
Index changes happen when the data goes into the table, including situations where it might not be committed. The index change doesn't ever get deferred to commit time, like you can things like foreign key checks. When a transaction is rolled back, the aborted row eventually gets marked as dead by vacuum, at which point any index records pointing to it can also be cleaned up.
-- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@xxxxxxxxxxxxxxx www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance