On Mon, Mar 16, 2009 at 2:50 PM, Joe Uhl <joeuhl@xxxxxxxxx> wrote: > I dropped the pool sizes and brought things back up. Things are stable, > site is fast, CPU utilization is still high. Probably just a matter of time > before issue comes back (we get slammed as kids get out of school in the > US). Yeah, I'm guessing your server (or more specifically its RAID card) just aren't up to the task. We had the same problem last year with a machine with 16 Gig ram and dual dual core 3.0GHz xeons with a Perc 5 something or other. No matter how we tuned it or played with it, we just couldn't get good random performance out of it. It's since been replaced by a white box unit with a tyan mobo and dual 4 core opterons and an Areca 1680 and a 12 drive RAID-10. We can sustain 30 to 60 Megs a second random access with 0 to 10% iowait. Here's a typical vmstat 10 output when our load factor is hovering around 8... r b swpd free buff cache si so bi bo in cs us sy id wa st 4 1 460 170812 92856 29928156 0 0 604 3986 4863 10146 74 3 20 3 0 7 1 460 124160 92912 29939660 0 0 812 5701 4829 9733 70 3 23 3 0 13 0 460 211036 92984 29947636 0 0 589 3178 4429 9964 69 3 25 3 0 7 2 460 90968 93068 29963368 0 0 1067 4463 4915 11081 78 3 14 5 0 7 3 460 115216 93100 29963336 0 0 3008 3197 4032 11812 69 4 15 12 0 6 1 460 142120 93088 29923736 0 0 1112 6390 4991 11023 75 4 15 6 0 6 0 460 157896 93208 29932576 0 0 698 2196 4151 8877 71 2 23 3 0 11 0 460 124868 93296 29948824 0 0 963 3645 4891 10382 74 3 19 4 0 5 3 460 95960 93272 29918064 0 0 592 30055 5550 7430 56 3 18 23 0 9 0 460 95408 93196 29914556 0 0 1090 3522 4463 10421 71 3 21 5 0 9 0 460 128632 93176 29916412 0 0 883 4774 4757 10378 76 4 17 3 0 Note the bursty parts where we're shoving out 30Megs a second and the wait jumps to 23%. That's about as bad as it gets during the day for us. NBote that in your graph your bi column appears to be dominating your bo column, so it looks like you're reaching a point where the write cache on the controller gets full and you're real throughput is shown to be ~ 1 megabyte a second outbound, and the inbound traffic either has priority or is just filling in the gaps. It looks to me like your RAID card is prioritizing reads over writes, and the whole system is just slowing to a crawl. I'm willing to bet that if you were running pure SW RAID with no RAID controller you'd get better numbers. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance