This should help... In each temporary table convert the time parts to a timestamp, then create an index on each of these, then join on the timestamp. ALTER table mmc add column timer timestamp without time zone; UPDATE mmc set timer = (utc_year || '-' || utc_month || '-' || utc_day || ' ' || utc_hour || ':' || utc_minute || ':' || utc_second)::timestamp; CREATE index mmc_timer_idx on mmc(timer); ALTER table gyro add column timer timestamp without time zone; UPDATE gyro set timer = (utc_year || '-' || utc_month || '-' || utc_day || ' ' || utc_hour || ':' || utc_minute || ':' || utc_second)::timestamp; CREATE index gyro_timer_idx on gyro(timer); so something like this should work if you use postgis - which I recommend for GPS data SELECT DISTINCT ON (project_id, platform_id, supplier_id, timer) 2 AS project_id, 1 AS platform_id, 6 AS supplier_id, m.timer, m.latitude, m.longitude, ST_SetSRID(ST_MAKEPOINT(m.longitude, m.latitude),4326) as location, m.sog AS speed_over_ground, m.cog AS course_over_ground, g.heading FROM rmc m, gyro g WHERE m.timer = g.timer; One comment: If either table has times recorded at better than 1 sec precision (ie - more than one value per second) you might join with the avg() value and group by to bring the output into 1 sec values. Cheers Brent Wood Programme leader: Environmental Information Delivery NIWA DDI: +64 (4) 3860529 ________________________________________ From: pgsql-general-owner@xxxxxxxxxxxxxx [pgsql-general-owner@xxxxxxxxxxxxxx] on behalf of Seb [spluque@xxxxxxxxx] Sent: Tuesday, December 31, 2013 2:53 PM To: pgsql-general@xxxxxxxxxxxxxx Subject: bulk loading table via join of 2 large staging tables Hi, I have two large CSV files that need to be merged and loaded into a single table of a database in Postgresql 9.3. I thought I'd do this by first staging the data in these files in two temporary tables: ---<--------------------cut here---------------start------------------->--- CREATE TEMPORARY TABLE rmc ( utc_year character varying(6), utc_month character varying(4), utc_day character varying(4), utc_hour character varying(4), utc_minute character varying(4), utc_second character varying(8), latitude numeric, longitude numeric, sog numeric, cog numeric); CREATE TEMPORARY TABLE gyro ( utc_year character varying(6), utc_month character varying(4), utc_day character varying(4), utc_hour character varying(4), utc_minute character varying(4), utc_second character varying(8), heading numeric); ---<--------------------cut here---------------end--------------------->--- And the target table in the database looks like this: ---<--------------------cut here---------------start------------------->--- Table "public.navigation_series" Column | Type | Modifiers ----------------------+-----------------------------+---------------------------------------------------------------------------------- navigation_record_id | integer | not null default nextval('navigation_series_navigation_record_id_seq'::regclass) project_id | integer | platform_id | integer | supplier_id | integer | time | timestamp without time zone | not null longitude | numeric | latitude | numeric | speed_over_ground | numeric | course_over_ground | numeric | heading | numeric | Indexes: "navigation_series_pkey" PRIMARY KEY, btree (navigation_record_id) "navigation_series_project_id_platform_id_supplier_id_time_key" UNIQUE CONSTRAINT, btree (project_id, platform_id, supplier_id, "time") Foreign-key constraints: "navigation_project_id_fkey" FOREIGN KEY (project_id) REFERENCES projects(project_id) ON UPDATE CASCADE ON DELETE RESTRICT "navigation_series_platform_id_fkey" FOREIGN KEY (platform_id) REFERENCES platforms(platform_id) ON UPDATE CASCADE ON DELETE RESTRICT "navigation_series_supplier_id_fkey" FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id) ON UPDATE CASCADE ON DELETE RESTRICT ---<--------------------cut here---------------end--------------------->--- Loading the temporary tables was very quick (about 3 min; input files were 580 Mb and 3.5 Gb) in psql, using: \copy gyro FROM 'gyro.csv' CSV \copy rmc FROM 'rmc.csv' CSV I then created a temporary view with: CREATE TEMPORARY VIEW rmc_gyro AS SELECT DISTINCT ON (project_id, platform_id, supplier_id, "time") 2 AS project_id, 1 AS platform_id, 6 AS supplier_id, (utc_year || '-' || utc_month || '-' || utc_day || ' ' || utc_hour || ':' || utc_minute || ':' || utc_second)::timestamp AS "time", longitude, latitude, sog AS speed_over_ground, cog AS course_over_ground, heading FROM rmc FULL JOIN gyro USING (utc_year, utc_month, utc_day, utc_hour, utc_minute, utc_second) ORDER BY project_id, platform_id, supplier_id, "time"; But at this point even just selecting a few rows of data from the view is too slow (I haven't seen the output after many hours). Given that the process involves a full join, I'm not sure I can do this in chunks (say breaking down the files into smaller pieces). Any suggestions would be greatly appreciated. Cheers, -- Seb -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Please consider the environment before printing this email. NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general