On 02/12/2017 23:17, Jeff Janes wrote:
I can't reproduce the exact issue on the real database any more. The query started to use the slow bitmap scan recently, and had been doing so for some time lately, but now it's switched back to use the index scan. The table involved in the query gets modified a lot. It has hundreds of millions of rows. Lots of new rows are appended to it every day, the oldest rows are sometimes removed. The table is analyzed at least daily. It's possible that statistics was updated and that caused the query to run differently. But I still would like to understand why that issue happened, and how to properly fix it, in case the issue returns. In the real database, not everything is in memory. There are 200GB+ of RAM, but DB is 500GB+. The table involved in the query itself is 60GB+ of data and 100GB+ of indexes. I'm running the test case in a way where all reads are done from RAM, only to make it easier to reproduce and to avoid unrelated effects. As far as know, costs in Postgres were designed to be relative to seq_page_cost, which for that reason is usually defined as 1.0. Even if everything would be in RAM, accesses to the pages would still not have zero cost. Setting 0.0 just seems too extreme, as all other non-zero costs would become infinitely bigger. Exactly. I'd like to understand why the worse plan is being chosen, and 1) if it's fixable by tuning costs, to figure out the right settings which could be used in production, 2) if there is a bug in Postgres optimizer, then to bring some attention to it, so that it's eventually fixed in one of future releases, 3) if Postgres is supposed to work this way, then at least I (and people who ever read this thread) would understand it better. Regards, Vitaliy
|