My client just informed me that new hardware is available for our DB
server.
. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)
I have put the ugly details of what we do with our DB below, as well
as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated
processes.
There are less than 10 users, as the server is devoted to the KB
production
system.
My questions:
1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend?
(which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?
Thanks,
Carlo
The details of our use:
. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also
large
. The DB is in a perpetual state of serving TCL-scripted Extract,
Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are "import tables"
. The import tables are permanently kept in the data warehouse so
that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an "audit table"
of 500M
rows, and counting.
. The size of the audit table makes it very difficult to manage,
especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL
processes
decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY
perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row
seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice
suggests as
best practice.
Current non-default conf settings are:
autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB