On Tue, Jan 15, 2013 at 3:44 PM, Horst Dehmer <horst.dehmer@xxxxxxxxx> wrote: > idx_4 together with a simple select in the tables on-insert trigger is > slowing things down considerably. So the theory is that the presence of idx_4 is causing the trigger to pick a poor plan (i.e. one using idx_4) while its absence removes that temptation? > pg_statio comes up with same big numbers (reads = bad, hits = not so bad?): True disk reads are much more expensive, but given how few reads you have relative to hits, I now think that in aggregate the hits are more of a concern than the reads are. In other words, you seem to be CPU bound, not IO bound. Even more so I think (but not with much confidence) that most of your "reads" are actually coming from the OS cache and not from the disk. PG cannot distinguish true disk reads from OS cache reads. When was the last time you reset the stats? That is, are your reported numbers accumulated over several loads, with some having idx4 and some not? ... > > Now I have some (more) questions: > > 1. How do I know which index (if any) is chosen for a select statement > inside a trigger during a bulk load transaction? (or for that matter: a > series of recursive plpgsql functions) Informally, reset your database to the state it was in before the load, analyze it, and do the explain again before you do the load. More formally, use use auto_explain and set auto_explain.log_nested_statements to true. I haven't verified this works with triggers, just going by the description I think it should. > 2. The query planner depends on stats collected by auto-vacuum/vacuum > analyze, right? Does stats collecting also happen during a lengthy > transaction? My understanding is that a transaction will not dump its stats until the commit, so the auto analyze will not occur *due to* the lengthy transaction until after it is over. But if the table was already due for analyze anyway due to previous or concurrent shorter transactions, the analyze will happen. However, the lengthy transaction might not see the results of the analyze (I'm not clear on the transaction snapshot semantics of the statistics tables) and even if it did see them, it might just be using cached plans and so would not change the plan in the middle. > 3. Is it possible (or even advisable) to trigger vacuum analyze inside an > ongoing transaction. Let's say load 10,000 rows of table A, analyze table A, > insert the next 10,000 rows, analyze again, ... You can't vacuum inside a transaction. You can analyze, but I don't know if it would be advisable. Your use case is a little unusual. If you are bulk loading into an initially empty table, usually you would remove the trigger and add it after the load (with some kind of bulk operation to make up for whatever it was the trigger would have been doing). On the other hand, if you are bulk loading into a "live" table and so can't drop the trigger, then the live table should have good-enough preexisting statistics to make the trigger choose a good plan. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance