SELECT INTO large FKyed table is slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The database for monitoring certain drone statuses is quite simple:

CREATE TABLE samples (
	sample_id integer not null primary key,
	sample_timestamp timestamp not null default now()
);

CREATE TABLE drones (
	drone_id integer not null primary key,
	drone_log_notice character varying,
	crone_coordinates point not null,
	drone_temperature float,
	drone_pressure float
);

CREATE TABLE drones_history (
	drone_id integer not null,
	sample_id integer not null,
	drone_log_notice character varying,
	drone_temperature float,
	drone_pressure float,
	constraint drones_history_pk primary key (drone_id, sample_id),
constraint drones_history_fk__samples foreign key (sample_id) references samples(sample_id), constraint drones_history_fk__drones foreign key (drone_id) references drones(drone_id)
);

Every ten to twenty minutes I receive CSV file with most of the drones statuses. CSV file includes data for new drones, if they're put into use. When I receive new data I load whole CSV file to a database, then call stored procedure that 'deals' with that data.

So far I have around 6000 samples, around 160k drones and drones_history is around 25M rows.

The CSV file contains around 15k-20k of 'rows', mostly data about old drones. Every now and then (on every 5th - 10th CSV-insert) there is data with around 1000-5000 new drones.

Here is what I do in stored procedure, after i COPYed the data from the CSV to temporary.drones table:

First, I create temporary table, inside the procedure, that holds rows for the new drones:

CREATE TEMPORARY TABLE tmpNew ON COMMIT DROP AS
SELECT drone_id, log_notice, coord_x, coord_y, temp, press
FROM temp.drones WHERE NOT EXISTS (SELECT 1 FROM public.drones WHERE public.drones.drone_id = temporary.drone.drone_id);

This is done in miliseconds, even if the count for the new drones is large (i've tested it with 10k new drones although I real-life action I'd never get more thatn 5k new drones per CSV).

INSERT INTO public.drones (drone_id, drone_log_notice, coordinates, drone_temperature, drone_temperature) SELECT drone_id, log_notice, point(coord_x, coord_y) as coordinates, temp, press FROM tmpNew; INSERT INTO public.drones_history (sample_id, drone_id, drone_log_notice, drone_temperature, drone_pressure)
SELECT a_sample_id, drone_id, log_notice, temp, pressue FROM tmpNew;

This is also done in miliseconds.

Now, I 'update' data for the existing drones, and fill in the history table on those drones. First I create temporary table with just the changed rows:

CREATE TEMPORARY TABLE tmpUpdate ON COMMIT DROP AS
SELECT a_batch_id, t.drone_id, t.log_notice, t.temp, t.press
   FROM temporary.drones t
   JOIN public.drones p
   ON t.drone_id = p.drone_id
WHERE p.drone_log_notice != t.log_notice OR p.temp != t.temp OR p.press != t.press;

Now, that part is also fast. I usualy have around 100-1000 drones that changed 'state', but sometimes I get even half of the drones change states (around 50k) and creation of the tmpUpdate takes no more than ten to twenty milliseconds.

This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes around 40 seconds. For 5000 rows this takes around 5 minutes. For 50k rows this takes around 30 minutes! Now this is where I start lag because I get new CSV every 10 minutes or so.

And the last part is to upadte the actual drones table:
UPDATE public.drones p
SET drone_log_notice = t.log_notice, drone_temperature = t.temp, drone_pressure = t.press
FROM temporary.drones t
WHERE t.drone_id = p.drone_id
AND (t.log_notice != p.drone_log_notice OR t.temp != p.drone_temperature OR p.press != t.drone_pressure);

This is also very fast, even when almost half the table is updated the UPDATE takes around 10 seconds. Usualy it's around 30-50 ms.

The machine I'm doing this has 4 GB of RAM, dual-Xeon something (3GHz). Two SAS drives in mirror, capable of around 100 MB/s in sequential r/w (i know it means nothing, but just to get an idea).

Database is around 2 GB is size (pg_database_size). When I dump/recreate the database I can speedup things a bit, but after half day of operations the INSERTs are slow again. When I do dump/restore of the database I get around 40/50 MB/sec reding/writing from the disk (COPYing data, PK/FK constraints creation), but when that INSERT gets stuck io-wait goes to skies - iostat shows that Postgres is mainly reading from the disks, around 800k/sec - 1024k/sec.

I've set shared_buffers to 256M, work_mem to 96M, wal_buffers to 16M and checkpoint_segments to 16. I've turned off autovaccum, I do analyze/vacuum after each insert-job is done, after TRUNCATEing temporary.drones table.

Out of despair I tried to set fsync=off, but that gave me just a small performance improvement.

When I remove foreign constraints (drones_history_fk__samples and drones_history_fk__drones) (I leave the primary key on drones_history) than that INSERT, even for 50k rows, takes no more than a second.

So, my question is - is there anything I can do to make INSERTS with PK faster? Or, since all the reference checking is done inside the procedure for loading data, shall I abandon those constraints entirely?

	Mario


--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux