On Wed, Jan 26, 2011 at 10:16 AM, Shaun Thomas <sthomas@xxxxxxxxx> wrote: > Worse however, is your checkpoints. Lord. Increase checkpoint_segments to > *at least* 20, and increase your checkpoint_completion_target to 0.7 or 0.8. > Check your logs for checkpoint warnings, and I'll bet it's constantly > complaining about increasing your checkpoint segments. Every checkpoint not > started by the scheduled system risks a checkpoint spike, which can flood > your system with IO regardless of which queries are running. That kind of IO > storm will ruin your performance, and with only 3 checkpoint segments on a > busy database, are probably happening constantly. To Shaun: Unless she's not write bound but read bound. We can't tell because we haven't seen the queries. We haven't seen the output of iostat or vmstat. To Anne: Another tool to recommend is sar. it's part of the sysstat package on debian / ubuntu / rhel. you have to enable it in various ways, it'll tell you when you try to run it after installing it. It allows you to look back over the last 7 days, 5 minutes at a time, to see the trends on your servers. Very useful stuff and easy to graph in a spreadsheet or web page. Or just read it. For instance, here's the output of sar on the data drive of a slave (read only) server under slony replication. sar -d -f sa25|grep "02:[01234].:.. AM.\+dev251-1" Linux 2.6.32-27-server () 01/25/2011 _x86_64_ (16 CPU) 12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 09:45:01 AM dev251-1 481.21 6981.74 1745.82 18.14 4.86 10.10 1.57 75.65 09:55:01 AM dev251-1 620.16 28539.52 2135.67 49.46 5.25 8.47 1.22 75.42 10:05:01 AM dev251-1 1497.16 29283.95 1898.94 20.83 13.89 9.28 0.64 96.52 10:15:01 AM dev251-1 1040.47 17123.89 2286.10 18.66 8.89 8.54 0.87 90.49 10:25:01 AM dev251-1 562.97 8802.77 1515.50 18.33 4.84 8.60 1.41 79.57 Let me interpret for ya, in case it's not obvious. IO Utilization runs from about 50% to about 90%. when it's at 90% we are running 700 to 1000 tps, reading at a maximum of 15MB a second and writing at a paltry 1M or so a second. Average wait stays around 10ms. If we use sar -u from the same time period, we cna match up iowait to this chart and see if we were really waiting on IO or not. 12:00:01 AM CPU %user %nice %system %iowait %steal %idle 09:45:01 AM all 47.44 0.00 5.20 4.94 0.00 42.42 09:55:01 AM all 46.42 0.00 5.63 5.77 0.00 42.18 10:05:01 AM all 48.64 0.00 6.35 11.87 0.00 33.15 10:15:01 AM all 46.94 0.00 5.79 8.81 0.00 38.46 10:25:01 AM all 48.68 0.00 5.58 5.42 0.00 40.32 We can see that we have at peak, 11% of our CPU power is waiting behind IO. We have 16 CPUs, so each one is 100/16 or 6.25% of the total. So at 11% we have two cores on hold the whole time basically. In real life on this machine we have ALL cpus waiting about 11% of the time across the board. But the math comes out the same. We're waiting on IO. Here's a heavy db server, lots of ram, same time period. sdh is one of a large number of disks in a RAID-10 array. md17 is that RAID-10 array (actually the RAID0 at the top of a bunch of RAID-1s I still don't trust linux's RAID-10 implementation). 12:00:01 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 09:45:01 sdh 5.86 5.65 158.87 28.08 0.21 35.07 3.36 1.97 09:45:01 md17 253.78 168.69 1987.64 8.50 0.00 0.00 0.00 0.00 09:55:01 sdh 5.48 5.65 134.99 25.68 0.16 30.00 3.31 1.81 09:55:01 md17 215.13 157.56 1679.67 8.54 0.00 0.00 0.00 0.00 10:05:01 sdh 4.37 5.39 106.53 25.58 0.09 21.61 3.57 1.56 10:05:01 md17 170.81 133.76 1334.43 8.60 0.00 0.00 0.00 0.00 10:15:01 sdh 6.16 5.37 177.25 29.64 0.25 40.95 3.38 2.08 10:15:01 md17 280.63 137.88 2206.95 8.36 0.00 0.00 0.00 0.00 10:25:01 sdh 4.52 3.72 116.41 26.56 0.09 20.64 3.58 1.62 10:25:01 md17 187.65 107.59 1470.88 8.41 0.00 0.00 0.00 0.00 (Note that md devices do not show values for %util, svctm or await hence the need for sdh) This db fits the data set in ram, the other machine doesn't. It had a RAID controller, but that caught fire, and burned down. The next caught fire, burned down, and fell into the swamp. It now has a straight up SAS controller with no caching. Numbers were even better when it had a caching RAID controller, but I got tired of replacing them. OK, so md17 is handling only 280 tps, while the array on the other, smaller server, was around 1,000. The master is reading 5 or 6 sectors per second, while the slave is reading 30k sectors a second. The master is writing at ~1500 to 2000 sectors a second, the slave is similar. The slave server here was IO bound very badly because it A: didn't have enough memory to cache the data set, and B: hadn't had time to warm up to get what memory it had to do the job. It was thrown into the mix mid morning rush and it fell flat on its ass. If it had been warmed up first (running it at 100:1 load factor by our load balancing module to start) it would have been ok. It would have still had horrible IO numbers though. Once the caches load by 1500hrs, the slave is reading at just 500 sectors / sec, child's play really. So, get sar running, and get some numbers from the machine when these things are happening. Look for how it looks before during and after the crisis. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance