> On Feb 10, 2016, at 2:47 AM, Jeff Janes <jeff.janes@xxxxxxxxx> wrote: > > On Tue, Feb 9, 2016 at 4:09 PM, Dan Langille <dan@xxxxxxxxxxxx> wrote: >> I have a wee database server which regularly tries to insert 1.5 million or >> even 15 million new rows into a 400 million row table. Sometimes these >> inserts take hours. >> >> The actual query to produces the join is fast. It's the insert which is >> slow. >> >> INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, >> DeltaSeq) >> SELECT batch_testing.FileIndex, batch_testing.JobId, Path.PathId, >> Filename.FilenameId, batch_testing.LStat, batch_testing.MD5, >> batch_testing.DeltaSeq >> FROM batch_testing JOIN Path ON (batch_testing.Path = Path.Path) >> JOIN Filename ON (batch_testing.Name = >> Filename.Name); >> >> This is part of the plan: http://img.ly/images/9374145/full created via >> http://tatiyants.com/pev/#/plans >> >> This gist contains postgresql.conf, zfs settings, slog, disk partitions. >> >> https://gist.github.com/dlangille/33331a8c8cc62fa13b9f > > The table you are inserting into has 7 indexes, all of which have to > be maintained. The index on the sequence column should be efficient > to maintain. But for the rest, if the inserted rows are not naturally > ordered by any of the indexed columns then it would end up reading 6 > random scattered leaf pages in order to insert row pointers. If none > those pages are in memory, that is going to be slow to read off from > hdd in single-file. Also, you are going dirty all of those scattered > pages, and they will be slow to write back to hdd because there > probably won't be much opportunity for write-combining. > > Do you really need all of those indexes? > > Won't the index on (jobid, pathid, filenameid) service any query that > (jobid) does, so you can get rid of the latter? > > And unless you have range queries on fileindex, like "where jobid = 12 > and fileindex between 4 and 24" then you should be able to replace > (jobid, fileindex) with (fileindex,jobid) and then get rid of the > stand-alone index on (fileindex). > > If you add an "order by" to the select statement which order by the > fields of one of the remaining indexes, than you could make the > maintenance of that index become much cheaper. I will make these changes one-by-one and test each. This will be interesting. > Could you move the indexes for this table to SSD? Now that's a clever idea. bacula=# select pg_size_pretty(pg_indexes_size('file')); pg_size_pretty ---------------- 100 GB (1 row) bacula=# select pg_size_pretty(pg_table_size('file')); pg_size_pretty ---------------- 63 GB (1 row) bacula=# No suprising that the indexes are larger than the data. The SSD is 30GB. I don't have enough space. Buying 2x500GB SSDs would allow me to put all the data onto SSD. I'm using about 306G for the databases now. > SSD is probably wasted on your WAL. If your main concern is bulk > insertions, then WAL is going to written sequentially with few fsyncs. > That is ideal for HDD. Even if you also have smaller transactions, OK. > WAL is still sequentially written as long as you have a non-volatile > cache on your RAID controller which can absorb fsyncs efficiently. Of note, no RAID controller or non-volatile cache here. I'm running ZFS with plain HBA controllers. Thank you. I have some interesting changes to test. -- Dan Langille - BSDCan / PGCon dan@xxxxxxxxxxxx -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance