Re: SELECT INTO large FKyed table is slow

Mario Splivalo <mario.splivalo@xxxxxxxxxx> · Wed, 01 Dec 2010 00:24:32 +0100

On 11/29/2010 05:53 PM, Pierre C wrote:

Yes, since (sample_id, drone_id) is primary key, postgres created
composite index on those columns. Are you suggesting I add two more
indexes, one for drone_id and one for sample_id?

(sample_id,drone_id) covers sample_id but if you make searches on
drone_id alone it is likely to be very slow since you got a large number
of sample_ids. Postgres can use any column of a multicolumn index but it
is only interesting performance-wise if the cardinality of the first
(ignored) columns is low. If you often make searches on drone_id, create
an index. But this isn't what is slowing your foreign key checks.

Again, you have a point there. When I get to SELECTs to the history 
table I'll be doing most of the filtering on the drone_id (but also on 
sample_id, because I'll seldom drill all the way back in time, I'll be 
interested in just some periods), so I'll take this into consideration.

But, as you've said, that's not what it's slowing my FK checks.

Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.

Is there a way to do so inside plpgsql function?

I can recreate the whole process within psql and then post the explain
analyze, it would just take me some time to do so. I'll post as soon
as I'm done.

Yes, this would be interesting.

So, I did. I run the whole script in psql, and here is the result for 
the INSERT:

realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id, 
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM 
tmp_drones_history;
                                                      QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------
 Seq Scan on tmp_drones_history  (cost=0.00..81.60 rows=4160 width=48) 
(actual time=0.008..5.296 rows=5150 loops=1)
 Trigger for constraint drones_history_fk__drones: time=92.948 calls=5150
 Total runtime: 16779.644 ms
(3 rows)

Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
The batch before, I run that one 'the usual way', it inserted 9922 rows, 
and it took 1 minute and 16 seconds.

I did not, however, enclose the process into begin/end.

So, here are results when I, in psql, first issued BEGIN:

realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id, 
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM 
tmp_drones_history;
                                                      QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------
 Seq Scan on tmp_drones_history  (cost=0.00..79.56 rows=4056 width=48) 
(actual time=0.008..6.490 rows=5059 loops=1)
 Trigger for constraint drones_history_fk__drones: time=120.224 calls=5059
 Total runtime: 39658.250 ms
(3 rows)

Time: 39658.906 ms

	Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance