Jake Magner <jakemagner90@xxxxxxxxx> writes: > I am trying to insert rows that don't already exist from a temp table into > another table. I am using a LEFT JOIN on all the columns and checking for > nulls in the base table to know which rows to insert. The problem is that > the planner is choosing a nested loop plan which is very slow over the much > faster (~40x) hash join. What's interesting is that all the row estimates > appear to be fairly accurate. I'm wondering if it has something to do with > the GIN indexes on bigint_array_1 and bigint_array_2. Perhaps it > misestimates the cost of each index scan? I'm curious about what happens to the runtime if you repeatedly roll back the INSERT and do it over (without vacuuming in between). What I'm thinking is that as the INSERT proceeds, it'd be making entries in those GIN indexes' pending-item lists, which the bitmap indexscan would have to scan through since it's examining the same table you're inserting into. The pending-item list is unsorted so it's relatively expensive to scan. Since, after you insert each row from temp_rows_to_insert, you're doing a fresh bitmap indexscan, it seems like the cost to deal with the pending item list would be proportional to O(N^2) --- so even though the cost per pending item is not that large, N=4000 might be enough to hurt. If this theory is correct, a second attempt to do the INSERT without having flushed the pending-list would be even more expensive; while if I'm wrong and that cost is negligible, the time wouldn't change much. The hashjoin approach avoids this problem by not using the index (and even if it did, the index would be scanned only once before any insertions happen). The planner unfortunately has no idea about this interaction. If this diagnosis is correct, there are a couple ways you could get around the problem: * disable use of the pending list by turning off "fastupdate" for these indexes. * construct the set of rows to be inserted in a separate command and put them into a second temp table, then insert to the main table. The second choice is probably preferable; doing bulk GIN inserts without fastupdate is kind of expensive itself. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance