On 14/07/2015 11:12, Igor Stassiy wrote: > Hello, > > I am benchmarking different ways of putting data into table on table > creation: > > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id <http://a.id> = b.id > <http://b.id>; > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id <http://a.id> = b.id > <http://b.id>; > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id <http://a.id> = b.id > <http://b.id>) TO STDOUT" | > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM STDIN"; > > (the parallel command is available as part of parallel deb package in > Ubuntu for example, it splits the stdin by newline character and feeds > it to the corresponding command) > > Both tables a and b have ~16M records and one of the columns in a is > geometry (ranging from several KB in size to several MB). Columns in b > are mostly integers. > > The machine that I am running these commands on has the following > parameters: > > default_statistics_target = 50 # pgtune wizard 2012-06-06 > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 > constraint_exclusion = on # pgtune wizard 2012-06-06 > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 > effective_cache_size = 48GB # pgtune wizard 2012-06-06 > work_mem = 80MB # pgtune wizard 2012-06-06 > wal_buffers = 8MB # pgtune wizard 2012-06-06 > checkpoint_segments = 16 # pgtune wizard 2012-06-06 > shared_buffers = 16GB # pgtune wizard 2012-06-06 > max_connections = 400 # pgtune wizard 2012-06-06 > > One would expect the 3rd option to be faster than 1 and 2, however 2 > outperforms both by a large margin (sometimes x2). This is especially > surprising taking into account that COPY doesn't acquire a global lock > on the table, only a RowExclusiveLock > (according > to http://www.postgresql.org/message-id/10611.1014867684@xxxxxxxxxxxxx) > What is wal_level value? I think this is because of an optimisation happening with wal_level = minimal: "In minimal level, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster" see http://www.postgresql.org/docs/current/static/runtime-config-wal.html > So is option 2 a winner by design? Could you please suggest other > alternatives to try (if there are any)? And what might be the reason > that 3 is not outperforming the other 2? > > Thank you, > Igor > > -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general