On Tue, Feb 9, 2016 at 4:09 PM, Dan Langille <dan@xxxxxxxxxxxx> wrote: > I have a wee database server which regularly tries to insert 1.5 million or > even 15 million new rows into a 400 million row table. Sometimes these > inserts take hours. > > The actual query to produces the join is fast. It's the insert which is > slow. > > INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, > DeltaSeq) > SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId, > Filename.FilenameId, batch_testing.LStat, batch_testing.MD5, > batch_testing.DeltaSeq > FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path) > JOIN Filename ON (batch_testing.Name = > Filename.Name); > > This is part of the plan: http://img.ly/images/9374145/full created via > http://tatiyants.com/pev/#/plans > > This gist contains postgresql.conf, zfs settings, slog, disk partitions. > > https://gist.github.com/dlangille/33331a8c8cc62fa13b9f The table you are inserting into has 7 indexes, all of which have to be maintained. The index on the sequence column should be efficient to maintain. But for the rest, if the inserted rows are not naturally ordered by any of the indexed columns then it would end up reading 6 random scattered leaf pages in order to insert row pointers. If none those pages are in memory, that is going to be slow to read off from hdd in single-file. Also, you are going dirty all of those scattered pages, and they will be slow to write back to hdd because there probably won't be much opportunity for write-combining. Do you really need all of those indexes? Won't the index on (jobid, pathid, filenameid) service any query that (jobid) does, so you can get rid of the latter? And unless you have range queries on fileindex, like "where jobid = 12 and fileindex between 4 and 24" then you should be able to replace (jobid, fileindex) with (fileindex,jobid) and then get rid of the stand-alone index on (fileindex). If you add an "order by" to the select statement which order by the fields of one of the remaining indexes, than you could make the maintenance of that index become much cheaper. Could you move the indexes for this table to SSD? SSD is probably wasted on your WAL. If your main concern is bulk insertions, then WAL is going to written sequentially with few fsyncs. That is ideal for HDD. Even if you also have smaller transactions, WAL is still sequentially written as long as you have a non-volatile cache on your RAID controller which can absorb fsyncs efficiently. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance