Dear all
I am having a problem of high cpu loads in my postgres server during peak time. Following are the
details of my setup (details as per the postgres wiki) .
* PostgreSQL version
o Run "select pg_version();" in psql or PgAdmin III and provide the full, exact output.
clusternode2:~ # rpm -qa | grep postgres
postgresql-devel-8.1.9-1.2
postgresql-8.1.9-1.2
postgresql-docs-8.1.9-1.2
postgresql-server-8.1.9-1.2
postgresql-libs-64bit-8.1.9-1.2
postgresql-libs-8.1.9-1.2
postgresql-jdbc-8.1-12.2
postgresql-contrib-8.1.9-1.2
* A description of what you are trying to achieve and what results you expect.
To keep the CPU Load below 10 , Now during peak times the load is nearing to 40
At that time , it is not possible to access the data.
* The EXACT text of the query you ran, if any
* The EXACT output of that query if it's short enough to be reasonable to post
o If you think the output is wrong, what you think should've been produced instead
* The EXACT error message you get, if there is one
As of now , i am unable to locate the exact query, the load shoots up abnormally during
peak time is the main problem .
* What program you're using to connect to PostgreSQL
Jakarta Tomcat - Struts with JSP
* What version of the ODBC/JDBC driver you're using, if any
postgresql-jdbc-8.1-12.2
* What you were doing when the error happened / how to cause the error. Describe in as much detail as possible, step by step, including command lines, SQL output, etc.
When certain tables with more than 3 lakh items are concurrently accessed by more than 300
users, the CPU load shoots up .
* Is there anything remotely unusual in the PostgreSQL server logs?
o On Windows these are in your data directory. On a default PostgreSQL install that'll be in C:\Program Files\PostgreSQL\8.4\data\pg_log (assuming you're using 8.4)
The log file /var/log/postgresql has no data .
o On Linux this depends a bit on distro, but you'll usually find them in /var/log/postgresql/.
* Operating system and version
o Linux users:
+ Linux distro and version
+ Kernel details (run "uname -a" on the terminal)
SLES 10 SP3
clusternode2:~ # uname -a
Linux clusternode2 2.6.16.46-0.12-ppc64 #1 SMP Thu May 17 14:00:09 UTC 2007 ppc64 ppc64 ppc64 GNU/Linux
* What kind of hardware you have.
o CPU manufacturer and model, eg "AMD Athlon X2" or "Intel Core 2 Duo"
o Amount and size of RAM installed, eg "2GB RAM"
High Availability Cluster with two IBM P Series Server and one DS4700 Storage
IBM P series P52A with 2-core 2.1 Ghz POWER5+ Processor Card , 36 MB L3 Cache ,16 GB of RAM,
73.4 GB 10,000 RPM Ultra320 SCSI Drive for Operating System .
o Storage details (important for performance and corruption questions)
+ Do you use a RAID controller? If so, what type of controller? eg "3Ware Escalade 8500-8"
# Does it have a battery backed cache module?
# Is write-back caching enabled?
+ Do you use software RAID? If so, what software and what version? eg "Linux software RAID (md) 2.6.18-5-686 SMP mod_unload 686 REGPARM gcc-4.1".
# In the case of Linux software RAID you can get the details from the "modinfo md_mod" command
+ Is your PostgreSQL database on a SAN?
# Who made it, what kind, etc? Provide what details you can.
+ How many hard disks are connected to the system and what types are they? You need to say more than just "6 disks". At least give maker, rotational speed and interface type, eg "6 15,000rpm Seagate SAS disks".
+ How are your disks arranged for storage? Are you using RAID? If so, what RAID level(s)? What PostgreSQL data is on what disks / disk sets? What file system(s) are in use?
# eg: "Two disks in RAID 1, with all PostgreSQL data and programs stored on one ext3 file system."
# eg: "4 disks in RAID 5 holding the pg data directory on an ext3 file system. 2 disks in RAID 1 holding pg_clog, pg_xlog, the temporary tablespace, and the sort scratch space, also on ext3.".
# eg: "Default Windows install of PostgreSQL"
+ In case of corruption data reports:
# Have you had any unexpected power loss lately?
# Have you run a file system check? (chkdsk / fsck)
# Are there any error messages in the system logs? (unix/linux: "dmesg", "/var/log/syslog" ; Windows: Event Viewer in Control Panel -> Administrative Tools )
IBM SAN DS4700 Storage with Fibre Channel HDD (73.4 GB * 10)
Two Partitions - 73.4 GB * 3 RAID 5 - 134 GB storage partitions (One holding Jakarata tomcat
application server and other holding Postgresql Database) .
Four Hard disk RAID 5 with ext3 file systems hold the pgdata on SAN .
Hard disk rotational speed is 73 GB 15K IBM 2 GB Fibre channel
No power loss, filesystem check also fine, No errors on /var/log/syslog
Following is the output of TOP command during offpeak time.
top - 18:36:56 up 77 days, 20:33, 1 user, load average: 12.99, 9.22, 10.37
Tasks: 142 total, 12 running, 130 sleeping, 0 stopped, 0 zombie
Cpu(s): 46.1%us, 1.9%sy, 0.0%ni, 6.1%id, 3.0%wa, 0.0%hi, 0.1%si, 42.9%st
Mem: 16133676k total, 13657396k used, 2476280k free, 450908k buffers
Swap: 14466492k total, 124k used, 14466368k free, 11590056k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
22458 postgres 19 0 2473m 477m 445m R 40 3.0 0:15.49 postmaster
22451 postgres 15 0 2442m 447m 437m S 33 2.8 0:30.44 postmaster
22464 postgres 17 0 2443m 397m 383m R 28 2.5 0:13.78 postmaster
22484 postgres 16 0 2448m 431m 412m S 20 2.7 0:02.73 postmaster
22465 postgres 17 0 2440m 461m 449m R 15 2.9 0:03.52 postmaster
22452 postgres 16 0 2450m 727m 706m R 13 4.6 0:23.46 postmaster
22476 postgres 16 0 2437m 413m 405m S 13 2.6 0:06.11 postmaster
22485 postgres 16 0 2439m 230m 222m R 7 1.5 0:05.72 postmaster
22481 postgres 15 0 2436m 175m 169m S 7 1.1 0:04.44 postmaster
22435 postgres 17 0 2438m 371m 361m R 6 2.4 1:17.92 postmaster
22440 postgres 17 0 2445m 497m 483m R 5 3.2 1:44.50 postmaster
22486 postgres 17 0 2432m 84m 81m R 4 0.5 0:00.76 postmaster
3 root 34 19 0 0 0 R 0 0.0 1:47.50 ksoftirqd/0
4726 root 15 0 29540 8776 3428 S 0 0.1 140:02.98 X
24950 root 15 0 0 0 0 S 0 0.0 0:30.96 pdflush
1 root 16 0 812 316 280 S 0 0.0 0:13.29 init
2 root RT 0 0 0 0 S 0 0.0 0:01.46 migration/0
4 root RT 0 0 0 0 S 0 0.0 0:00.78 migration/1
5 root 34 19 0 0 0 S 0 0.0 1:36.79 ksoftirqd/1
6 root RT 0 0 0 0 S 0 0.0 0:01.46 migration/2
7 root 34 19 0 0 0 R 0 0.0 1:49.83 ksoftirqd/2
8 root RT 0 0 0 0 S 0 0.0 0:00.79 migration/3
9 root 34 19 0 0 0 S 0 0.0 1:38.18 ksoftirqd/3
10 root 10 -5 0 0 0 S 0 0.0 1:02.11 events/0
11 root 10 -5 0 0 0 S 0 0.0 1:03.27 events/1
12 root 10 -5 0 0 0 S 0 0.0 1:01.76 events/2
13 root 10 -5 0 0 0 S 0 0.0 1:02.29 events/3
14 root 10 -5 0 0 0 S 0 0.0 0:00.01 khelper
1016 root 10 -5 0 0 0 S 0 0.0 0:00.00 kthread
1054 root 10 -5 0 0 0 S 0 0.0 0:03.08 kblockd/0
1055 root 10 -5 0 0 0 S 0 0.0 0:02.83 kblockd/1
1056 root 10 -5 0 0 0 S 0 0.0 0:03.19 kblockd/2
The CPU Load shoots upto 40 during peak time.
Following is my postgresql.conf (without comments)
hba_file = '/var/lib/pgsql/data/pg_hba.conf'
listen_addresses = '*'
port = 5432
max_connections = 1800
shared_buffers = 300000
max_fsm_relations = 1000
effective_cache_size = 200000
log_destination = 'stderr'
redirect_stderr = on
log_rotation_age = 0
log_rotation_size = 10240
silent_mode = %d %u '
autovacuum = on
datestyle = 'iso, dmy'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
User Access
Total Number of Users is 500
Maximum number of Concurrent users will be 500 during peak time
Off Peak time the maximum number of concurrent user will be around 150 to 200.
Please let me know your suggestions to improve the performance.
Regards
Shiva Raman