On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote: > Sok Ann Yap Âwrote: >> Kevin Grittner Âwrote: > >>> Please show us your overall configuration and give a description >>> of the hardware (how many of what kind of cores, how much RAM, >>> what sort of storage system). > >> Here's the configuration (this is just a low end laptop): > >> version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, >> compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0, >> pie-0.4.5) 4.5.2, 64-bit >> checkpoint_segments | 16 >> default_statistics_target | 10000 > > Usually overkill. ÂIf this didn't help, you should probably change it > back. > >> effective_cache_size | 512MB >> lc_collate | en_US.UTF-8 >> lc_ctype | en_US.UTF-8 >> listen_addresses | * >> log_destination | syslog >> log_min_duration_statement | 0 >> maintenance_work_mem | 256MB >> max_connections | 100 > > You probably don't need this many connections. > >> max_stack_depth | 2MB >> port | 5432 >> random_page_cost | 4 >> server_encoding | UTF8 >> shared_buffers | 256MB >> silent_mode | on >> TimeZone | Asia/Kuala_Lumpur >> wal_buffers | 1MB >> work_mem | 32MB >> (20 rows) > > It's hard to recommend other changes without knowing the RAM on the > system. ÂHow many of what kind of CPUs would help, too. > >> The thing is, the query I posted was fairly simple (I think), and >> PostgreSQL should be able to choose the 3000+ times faster index >> scan with the default random_page_cost of 4. > > It picks the plan with the lowest estimated cost. ÂIf it's not > picking the best plan, that's usually an indication that you need to > adjust cost factors so that estimates better model the actual costs. > >> If I need to reduce it to 2 when using a 5.4k rpm slow disk, what >> is random_page_cost = 4 good for? > > It's good for large databases with a lot of physical disk I/O. ÂIn > fact, in some of those cases, it needs to be higher. ÂIn your test, > the numbers indicate that everything was cached in RAM. ÂThat makes > the effective cost very low. > > Also, the odds are that you have more total cache space between the > shared_buffers and the OS cache than the effective_cache_size > setting, so the optimizer doesn't expect the number of cache hits > you're getting on index usage. > > -Kevin > Thanks for the tips and explanation. I wrongly assumed the random_page_cost value is independent from caching. Now, let's go back to the original query: SELECT salutations.id, salutations.name, EXISTS ( SELECT 1 FROM contacts WHERE salutations.id = contacts.salutation_id ) AS in_use FROM salutations If I split up the query, i.e. running this once: SELECT salutations.id, salutations.name FROM salutations and then running this 44 times, once for each row: SELECT EXISTS ( SELECT 1 FROM contacts WHERE contacts.salutation_id = ? ) AS in_use I can see that PostgreSQL will smartly pick the best plan, i.e. for common salutations (Madam, Ms, etc), it will do sequential scan, while for salutations that are rarely used or not used at all, it will do index scan. Anyway, the overhead of spawning 44 extra queries means that it is still better off for me to stick with the original query and tune PostgreSQL to choose index scan. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance