1) What is the size of the whole database? Does that fit in your memory? That's the first thing I'd like to know and I can't find it in your post.
Current on-disk size is about 51 gig. I'm not sure if there's a different size I should be looking at instead, but that's what du tells me the directory for the database in the "base" directory is sized at. So, no, it doesn't fit into memory all the way.
I'm missing several other important values too - namely shared_buffers max_fsm_pages work_mem maintenance_work_mem
I didn't share these because they've been in flux :) I've been experimenting with different values, but currently we're using:
8GB shared_buffers 100000 max_fsm_pages 256MB work_mem 6GB maintenance_work_mem
BTW, is the autovacuum daemon running? If yes, try to stop it during the import (and run ANALYZE after the import of data).
No. all vacuums are done explicitly since the database doesn't change during the day. The 'order of operations' is:
- Load COBOL data into database (inserts/updates) - VACUUM COBOL data - Summarize COBOL data (inserts/updates with the big table using COPY) - VACUUM summary tables So everything gets vacuumed as soon as it's updated.
2) What is the size of a disc page? Without that we can only guess what doest the effective_cache_size number means - in the usual case it's 8kB thus giving about 5.2 GiB of memory.
I believe it's 8kB. I definitely haven't changed it :)
As suggested in http://www.powerpostgresql.com/PerfList I'd increase that to about 1.400.000 which about 10.5 GiB (about 2/3 of RAM). Anyway - don't be afraid this breaks something. This is just an information for PostgreSQL how much memory the OS is probably using as a filesystem cache. PostgreSQL uses this to evaluate the probability that the page is in a cache.
Okay, I'll try the value you recommend. :)
3) What is the value of maintenance_work_mem? This is a very important value for CREATE INDEX (and some other). The lower this value is, the slower the CREATE INDEX is. So try to increase the value as much as you can - this could / should improve the import performance considerably. But be careful - this does influence the amount of memmory allocated by PostgreSQL. Being in your position I wouldn't do this in the postgresql.conf - I'd do that in the connection used by the import using SET command, ie. something like SET maintenance_work_mem = 524288; CREATE INDEX ... CREATE INDEX ... CREATE INDEX ... CREATE INDEX ... for a 512 MiB of maintenance_work_mem. Maybe even a higher value could be used (1 GiB?). Just try to fiddle with this a little.
It's currently at 6GB in postgres.conf, though you have a good point in that maybe that should be before the indexes are made to save room. Things are certainly kinda tight in the config as is.
4) Try to set up some performance monitoring - for example a 'dstat' is a nice way to do that. This way you can find yout where's the bottleneck (memory, I/O etc.) That's basically all I can think of right now.
Thanks for the tips :) Steve