Re: Slow Inserts on 1 table?

Richard Huxton <dev@xxxxxxxxxxxx> · Tue, 02 Aug 2005 16:12:09 +0100

Dan Armbrust wrote:

What, ALWAYS faster, even for the first FK check when there's only one 
row in the target table and that's cached?

If you're really in a hurry doing your bulk loads:
 1. Use COPY.
 2. Drop/restore the foreign-key constraints before/after.
That will be hugely faster than INSERTs, although it's not always an 
applicable solution.
--
  Richard Huxton
  Archonet Ltd

It seems like the query planner goes to great lengths to avoid using 
indexes because it might take 5 ms longer to execute an index lookup on 
a table with one row.
But then, when the table has 1 million rows, and a full scan takes 3 
minutes, and the index scan takes 3 seconds, it has no problem picking 
the 3 minute route.
I'll gladly give up the 5 ms in turn for not having to wait 3 minutes, 
which is why I disabled the sequential scans.  If I have a small table, 
where indexes won't speed things up, I wont build an index on it.

The other factor, is that most of my tables have at least thousands, and 
usually millions of rows.  Sequential scans will never be faster for the 
queries that I am doing - like I said, that is why I created the indexes.

The issue is nothing to do with special "small table" handling code. 
It's all to do with not having up-to-date stats. Of course, once you've 
analysed your table the system knows your index is good.

My loading is done programatically, from another format, so COPY is not 
an option. 

Why not? A lot of my bulk-loads are generated from other systems and I 
go through a temporary-file/pipe via COPY when I can. When I don't I 
block inserts into groups of e.g. 1000 and stick in an analyse/etc as 
required.

> Neither is removing foreign keys, as they are required to
guarantee valid data. 

Ah, but you can still guarantee your data. You can wrap the whole 
drop-FK, bulk-load, recreate-FK in a single transaction, and it can 
still be faster. Obviously doing this on a high-activity table won't win 
though, you'll have to block everyone else doing updates.

I don't really have a problem with the insert 
speed when it is working properly - it is on par with other DBs that I 
have on the same hardware.  The problem is when it stops using the 
indexes, for no good reason.

Example, last night, I kicked off a load process - this morning, it had 
only managed to make it through about 600,000 rows (split across several 
tables).  After restarting it this morning, it made it through the same 
data in 30 minutes.
If thats not bad and buggy behavior, I don't know what is....

So run ANALYSE in parallel with your load, or break the bulk-load into 
blocks and analyse in-line. I'm not sure ripping out PG's cost-based 
query analyser will be a popular solution just to address bulk-loads.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match