On 14/07/2015 18:50, Igor Stassiy wrote: > Julien, I would gladly provide more information, I am just not sure what > to add. > Well, was your concern about why option #2 is the quickest, or is this runtime with option #2 still too slow for you ? > I would be willing to leave the server compromised for things like > corrupts or data losses during the time of this import, but the server > has to be up and running before and after the import, if it is > successful (so I can't take it down then change some parameters and > start it up with again). Check http://www.postgresql.org/docs/current/static/sql-createtable.html and the "UNLOGGED" part to check if an unlogged table is suitable for you. > > On Tue, Jul 14, 2015 at 6:37 PM Julien Rouhaud > <julien.rouhaud@xxxxxxxxxx <mailto:julien.rouhaud@xxxxxxxxxx>> wrote: > > On 14/07/2015 18:21, Igor Stassiy wrote: > > Julien, I have the following setting for WAL level: #wal_level = > minimal > > (which defaults to minimal anyway) > > > > Sorry, I sent my mail too early :/ > > So, option #2 is winner by design. You didn't say anything about your > needs, so it's hard to help you much more. > > If you don't care about losing data on this table if your server > crashes, you can try option #3 with an unlogged table. > > > > On Tue, Jul 14, 2015 at 6:19 PM Julien Rouhaud > > <julien.rouhaud@xxxxxxxxxx <mailto:julien.rouhaud@xxxxxxxxxx> > <mailto:julien.rouhaud@xxxxxxxxxx > <mailto:julien.rouhaud@xxxxxxxxxx>>> wrote: > > > > On 14/07/2015 11:12, Igor Stassiy wrote: > > > Hello, > > > > > > I am benchmarking different ways of putting data into table > on table > > > creation: > > > > > > 1. INSERT INTO c SELECT * FROM a JOIN b on a.id > <http://a.id> <http://a.id> > > <http://a.id> = b.id <http://b.id> <http://b.id> > > > <http://b.id>; > > > 2. CREATE TABLE c AS SELECT * FROM a JOIN b on a.id > <http://a.id> <http://a.id> > > <http://a.id> = b.id <http://b.id> <http://b.id> > > > <http://b.id>; > > > 3. psql -c "COPY (SELECT * FROM a JOIN b on a.id > <http://a.id> <http://a.id> > > <http://a.id> = b.id <http://b.id> <http://b.id> > > > <http://b.id>) TO STDOUT" | > > > parallel --block 128M --jobs 4 --pipe psql -c "COPY c FROM > STDIN"; > > > > > > (the parallel command is available as part of parallel deb > package in > > > Ubuntu for example, it splits the stdin by newline character > and feeds > > > it to the corresponding command) > > > > > > Both tables a and b have ~16M records and one of the columns > in a is > > > geometry (ranging from several KB in size to several MB). > Columns in b > > > are mostly integers. > > > > > > The machine that I am running these commands on has the > following > > > parameters: > > > > > > default_statistics_target = 50 # pgtune wizard 2012-06-06 > > > maintenance_work_mem = 1GB # pgtune wizard 2012-06-06 > > > constraint_exclusion = on # pgtune wizard 2012-06-06 > > > checkpoint_completion_target = 0.9 # pgtune wizard 2012-06-06 > > > effective_cache_size = 48GB # pgtune wizard 2012-06-06 > > > work_mem = 80MB # pgtune wizard 2012-06-06 > > > wal_buffers = 8MB # pgtune wizard 2012-06-06 > > > checkpoint_segments = 16 # pgtune wizard 2012-06-06 > > > shared_buffers = 16GB # pgtune wizard 2012-06-06 > > > max_connections = 400 # pgtune wizard 2012-06-06 > > > > > > One would expect the 3rd option to be faster than 1 and 2, > however 2 > > > outperforms both by a large margin (sometimes x2). This is > especially > > > surprising taking into account that COPY doesn't acquire a > global lock > > > on the table, only a RowExclusiveLock > > > (according > > > to > > > http://www.postgresql.org/message-id/10611.1014867684@xxxxxxxxxxxxx) > > > > > > > What is wal_level value? I think this is because of an > optimisation > > happening with wal_level = minimal: > > > > "In minimal level, WAL-logging of some bulk operations can be > safely > > skipped, which can make those operations much faster" > > > > see > > > http://www.postgresql.org/docs/current/static/runtime-config-wal.html > > > > > So is option 2 a winner by design? Could you please suggest > other > > > alternatives to try (if there are any)? And what might be > the reason > > > that 3 is not outperforming the other 2? > > > > > > Thank you, > > > Igor > > > > > > > > > > > > -- > > Julien Rouhaud > > http://dalibo.com - http://dalibo.org > > > > > -- > Julien Rouhaud > http://dalibo.com - http://dalibo.org > -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general