Performace Optimization for Dummies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am a software developer who is acting in a (temporary) dba role for a 
project. I had recommended PostgreSQL be brought in to replace the proposed 
MySQL DB - I chose PostgreSQL because of its reputation as a more stable 
solution than MySQL.

At this early stage in the project, we are initializing our portal's 
database with millions of rows of imported data in over 50 different 
flattened tables; each table's structure is unique to the data provider. 
This requires a pretty complex import program, because the data must be 
matched semantically, not literally. Even with all of the expression 
matching and fuzzy logic in the code,our performance statistics show that 
the program spends over 75% of its time in SQL queries looking for matching 
and/or duplicate data.

The import is slow - and degrades as the tables grow. With even more 
millions of rows in dozens of import tables to come, the imports will take 
forever. My ability to analyse the queries is limited; because of the nature 
of the import process, the SQL queries are mutable, every imported row can 
change the structure of a SQL query as the program adds and subtracts search 
conditions to the SQL command text before execution. The import program is 
scripted in Tcl. An attempt to convert our queries to prepared queries 
(curiousy) did not bring any performance improvements, and we converted back 
to simplify the code.

We urgently need a major performance improvement. We are running the 
PostgreSQL 8.1.4 on a Windows 2003 x64 Server on a dual processor, dual core 
3.2Ghz Xeon box with 4gb RAM and a RAID (sorry, I don't know what type) disc 
subsystem. Sorry about the long intro, but here are my questions:

1) Are we paying any big penalties by running Windows vs LINUX (or any other 
OS)?

2) Has the debate over PostgreSQL and Xeon processors been settled? Is this 
a factor?

3) Are there any easy-to-use performance analysis/optimisation tools that we 
can use? I am dreaming of one that could point out problems and suggest 
and.or effect solutions.

4) Can anyone recommend any commercial PostgreSQL service providers that may 
be able to swiftly come in and assist us with our performance issues?

Below, please find what I believe are the configuration settings of interest 
in our system

Any help and advice will be much appreciated. TIA,

Carlo

max_connections = 100
shared_buffers = 50000
work_mem = 32768
maintenance_work_mem = 32768
checkpoint_segments = 128
effective_cache_size = 10000
random_page_cost = 3
stats_start_collector = on
stats_command_string = on
stats_row_level = on
autovacuum = on









[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux