On 10/1/07, Jan Theodore Galkowski <bayesianlogic@xxxxxxx> wrote: > Scott, > > i didn't think this belonged in the general list, but the example i gave > for discussion was a toy, for illustration. i could not very well post > the actual example for many reasons, including proprietary ones and, > given this is how things work, because the 1.5 million row table in > question is its own smallest description. This is the exact kind of question that belongs on -general. But it does get asked a lot, especially by people coming from other databases. > while indexes are being used on that table, there's a companion table > which is much smaller -- a "mere" 75000 rows -- which is suffering a > sequential scan, and i was trying to eliminate those. Well, don't think of sequential scans as plain bad. Sometimes they're the best choice, sometimes they're not. Now, if an index scan is provably and noticeably faster than the sequential scan, then the planner is making the wrong decision. Have you tried running your query with set enable_seqscan=off; to see how it behaves? I've found many somewhat slow queries got really fast or really slow when I did that. Note that you shouldn't blindly run a query all the time with that setting, as there are many instances where seqscan is the right answer. Also, your explain cost estimates will all be way off. > perhaps it is true that ANALYZE isn't being done often enough. perhaps > VACUUMs aren't being done often enough either. we're leary of > scheduling repeated VACUUMs having encountered a case where the VACUUM > took over an hour to complete. Run "analyze verbose" on your db and see what it says about number of page slots needed versus used. that will help you tell if you're vacuuming enough. How long vacuum takes isn't really that important. What is important is how much of an impact it's having on the system. there are several vacuum parameters in the postgresql.conf file that can lower the impact vacuum has on your system I/O wise while increasing its run time. Vacuum full is another story. Think of it as a recovery tool, not a periodic maintenance tool. > it may, too, be because the tables use user-defined types heavily and > the original UPDATE involved a SELECT ... IN ... having a GROUP BY with > a few references to columns deep within user-defined types. Hard to say without a query and an explain analyze output. It's common for user defined functions to produce estimates in the planner that are way off. user defined types, not so much. But the more complex the query the more likely it is that the query planner will make a bad estimate of the number of rows somewhere and choose a bad method. > that > wouldn't have been my choice, but, then, they were given to me to work, > not my design. in fact, PG is the first relational database > implementation i've used that offered such things in a big way. Extensibility is quite a useful tool. > i also don't understand some other things, which are surprising, like > why some UPDATEs take so much longer when wrapped in a BEGIN > TRANSACTION- > COMMIT than when having the transaction at a statement level. that is strange. I'd expect that maybe you've got something happening with the transaction waiting on other transactions, so that it's not so much running hard as just tapping its toe waiting for the other transaction to commit or roll back. > I come from an Oracle, DB2, Informix world, and in my experience > plans for queries are more stable. i have loitered in and around MySQL > for a while. i'm not surprised there's a learning curve with PG. i am > surprised it breaks so marked with mainstay database experience. Oh, I've seen Oracle get stupid due to lack of proper statistics as well. You like had a good DBA who kept all that stuff hidden from you though. But PostgreSQL and mainline db experience are often incompatible. The very things that people are used to using to make other dbs fast (forcing index usage for instance) can make postgresql noticeably slower. You might find that partial index help for some circumstances. If you are using a query that has a where clause that looks at a field that has one value 99% of the time and another value 1% of the time, you can index that 1% only, and an index scan will be ultra quick. The standard case for that is a boolean field. create table test (id int, info text, btest bool); insert 100,000 rows, with 1% having btest=true, the rest false. create index test_btest_true on test(btest) where btest IS TRUE; analyze test; explain analyze select * from test where btest is true; Generally, postgresql offers different ways to solve the same problems as other database, and knowing those ways can really help troubleshoot and fix poorly performing queries. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly