> But I doubt your answer. I think the essence of the problem is when the > planner selects 'Bitmap Index Scan' and how the planner computes the cost > of 'Bitmap Index Scan'. The essence of the problem obviously is a bad estimate of the cost. The planner considers the two plans, computes the costs and then chooses the one with the lower cost. But obviously the cost does not reflect the reality (first time when the query is executed and the filesystem cache is empty). > Tom Lane said ïïIn principle a bitmap index scan should be significantly > faster if the index can return the bitmap more or less "natively" rather > than having to construct it. My recollection though is that a significant > amount of work is needed to make that happen, and that there is no > existing patch that tackled the problem. So I'm not sure that this report > should be taken as indicating that there's no chance of a SELECT > performance improvement. What it does say is that we have to do that work > if we want to make bitmap indexes useful.ïï Tom Lane is right (as usual). The point is that when computing the cost, planner does not know whether the data are already in the filesystem cache or if it has to fetch them from the disk (which is much slower). > Okay, I want to know how the planner computes the cost of constructing > bitmap. And when the planner computes the cost of 'Bitmap Index Scan', if > it considers the influence of memory cache? As when I do not clear the > memory cache, I find the 'Bitmap Index Scan' is real fast than 'Seq > Scan'. There are two things here - loading the data from a disk into a cache (filesystem cache at the OS level / shared buffers at the PG level), and then the execution itself. PostgreSQL estimates the first part using an effective_cache_size hint, and uses that to estimate the probability that the data are already in the filesystem cache. But you're confusing him by the 'reboot' which results in an empty cache. The plan itself seems fine to me - you might play with the cost variables, but I think it won't improve the overall perfomance. Actually what you see is a worst case scenario - the plan is not bad if the data are in a cache (filesystem or shared buffers), but when Pg has to read the data from the disk, performance sucks. But is this reflecting reality? How often is the query executed? What other queries are executed on the box? What is the size of shared_buffers? If the query is executed often (compared to other queries) and the shared buffers is set high enough, most of the table will remain in the shared buffers and everything will work fine. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance