On 9/27/07, Jan Theodore Galkowski <bayesianlogic@xxxxxxx> wrote: > I fear this has been asked many times about PostgreSQL, and I have read > the docs about how indexes are supposed to be defined and used, but I > don't understand why the engine and optimizer is doing what it does in > the simplest of situations. Is it that its tuning is heavily data > dependent? This has in fact been discussed many times on the mailing list. Searching the archives will probably return lots of discussions. > Like *how* *come*? There are indexes on both columns of the join. Is > it the NUMERIC datatype messing things up? Unlikely, as I've seen the > same with INTEGERs. Postgresql doesn't store "visibility" information in indexes. this means that once you find the entry in an index, you then have to see if it's visible to the current transaction, and that information is only stored in the tables. And there are lots of discussions of why that is in the archives as well. Basically race conditions make it impossible to update the table and index concurrently without ugly locking issues popping up. So, in pgsql, whether there's an index or not, the db has to hit the table in the end. > If it is data dependent (these tables are presently empty), any > suggestions as to how to tune a database for unknown mixes of data? No it isn't. It is range dependent. If you had a selective enough where clause then postgresql would choose an index over a sequential scan. Your biggest mistake here is thinking the simple solution (use indexes) is always best. PostgreSQL uses a cost based planner that tries to decide ahead of time what plan is going to be fastest. The real answer is to give it good information (i.e. run analyze frequently enough, and have a high enough stats target for the column(s) you're using) That means pgsql is paying attention to how big your tables are as well as what values are in there and what % you're going to get back. Use explain analyze to see the differences between what the planner expects and what it gets. Like this part of your explain analyze output: Seq Scan on foo a (cost=0.00..11.80 rows=180 width=407) (actual time=0.003..0.003 rows=0 loops=1) note that the planner expected 180 rows but got 0. that's a sign of poor stats. Run analyze and you should see something closer to a match between expected and actual rows. Also, try putting some real data in your db, and using a where clause (unless you really are gonna grab every single row every time...) ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings