> On Feb 10, 2016, at 5:13 AM, Dan Langille <dan@xxxxxxxxxxxx> wrote: > >> On Feb 10, 2016, at 2:47 AM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: >> >> On Tue, Feb 9, 2016 at 4:09 PM, Dan Langille <dan@xxxxxxxxxxxx> wrote: >>> I have a wee database server which regularly tries to insert 1.5 million or >>> even 15 million new rows into a 400 million row table. Sometimes these >>> inserts take hours. >>> >>> The actual query to produces the join is fast. It's the insert which is >>> slow. >>> >>> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >>> DeltaSeq) >>> SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId, >>> Filename.FilenameId, batch_testing.LStat, batch_testing.MD5, >>> batch_testing.DeltaSeq >>> FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path) >>> JOIN Filename ON (batch_testing.Name = >>> Filename.Name); >>> >>> This is part of the plan: http://img.ly/images/9374145/full created via >>> http://tatiyants.com/pev/#/plans >>> >>> This gist contains postgresql.conf, zfs settings, slog, disk partitions. >>> >>> https://gist.github.com/dlangille/33331a8c8cc62fa13b9f >> >> The table you are inserting into has 7 indexes, all of which have to >> be maintained. The index on the sequence column should be efficient >> to maintain. But for the rest, if the inserted rows are not naturally >> ordered by any of the indexed columns then it would end up reading 6 >> random scattered leaf pages in order to insert row pointers. If none >> those pages are in memory, that is going to be slow to read off from >> hdd in single-file. Also, you are going dirty all of those scattered >> pages, and they will be slow to write back to hdd because there >> probably won't be much opportunity for write-combining. >> >> Do you really need all of those indexes? >> >> Won't the index on (jobid, pathid, filenameid) service any query that >> (jobid) does, so you can get rid of the latter? >> >> And unless you have range queries on fileindex, like "where jobid = 12 >> and fileindex between 4 and 24" then you should be able to replace >> (jobid, fileindex) with (fileindex,jobid) and then get rid of the >> stand-alone index on (fileindex). >> >> If you add an "order by" to the select statement which order by the >> fields of one of the remaining indexes, than you could make the >> maintenance of that index become much cheaper. > > I will make these changes one-by-one and test each. This will be interesting. On a test server, the original insert takes about 45 minutes. I removed all indexes. 25 minutes. Thank you. -- Dan Langille - BSDCan / PGCon dan@xxxxxxxxxxxx -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance