On Mon, Jan 11, 2010 at 8:50 AM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote:
RAID-0
We have an application server that is processing requests. Each request consists of a combination of selects, inserts, and deletes. We actually see degredation when we get more than 40 concurrent requests. The exact number of queries executed by each request isn't known. It varies per request. But, an example request would be about 16 inserts and 113 selects. Any given request can't execute more than a single query at a time.
To be more specific about the degradation, we've set the "log_min_duration_statement=200", and when we run with 40 concurrent requests, we don't see queries showing up in there. When we run with 60 concurrent requests, we start seeing queries show up, and when we run 200+ requests, we see multi-second queries.
This is to be expected, to some extent, as we would expect some perfromance degradation with higher utilization. But, the hardware doesn't appear to be very busy, and that's where we're hoping for some help.
We want to have Postgres eating up as many resources as possible to chug through our queries faster. Right now, it's running slower with more utilization, but there's still too much idle time for the CPUs.
Here's some iostat output (from the 3rd data point from iostat -x 10)... this was taken while we were processing 256 simultaneous requests.
avg-cpu: %user %nice %system %iowait %steal %idle
34.29 0.00 7.09 0.03 0.00 58.58
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 112.20 0.00 133.40 0.00 1964.80 14.73 0.42 3.17 0.04 0.48
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda5 0.00 112.20 0.00 133.40 0.00 1964.80 14.73 0.42 3.17 0.04 0.48
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.40 0.00 3.20 8.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 99.90 0.00 799.20 8.00 0.15 1.50 0.01 0.12
dm-3 0.00 0.00 0.00 0.60 0.00 4.80 8.00 0.00 0.33 0.17 0.01
dm-4 0.00 0.00 0.00 144.70 0.00 1157.60 8.00 0.46 3.17 0.02 0.35
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The iowait seems pretty low, doesn't it?
Here's the vmstat output, with the 6th data element clipped out, which seems representative of the whole... (also taken during 256 simultaneous request)
[root@ecpe1 pg_log]# vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
8 0 0 21818136 696692 3871172 0 0 0 1042 2261 48418 34 7 59 0 0
That's a lot of free mem, which is to be expected. Our database is not very large.
(taken during 256 simultaneous requests)
top - 10:11:43 up 12 days, 20:48, 6 users, load average: 10.48, 4.16, 2.83
Tasks: 798 total, 8 running, 790 sleeping, 0 stopped, 0 zombie
Cpu0 : 33.3%us, 7.6%sy, 0.0%ni, 59.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 31.6%us, 5.9%sy, 0.0%ni, 62.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 33.0%us, 6.6%sy, 0.0%ni, 60.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 35.4%us, 6.2%sy, 0.0%ni, 58.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 36.3%us, 5.6%sy, 0.0%ni, 58.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 37.4%us, 6.2%sy, 0.0%ni, 56.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu6 : 38.1%us, 6.0%sy, 0.0%ni, 56.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 39.2%us, 7.5%sy, 0.0%ni, 52.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu8 : 35.5%us, 7.2%sy, 0.0%ni, 56.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 37.8%us, 7.6%sy, 0.0%ni, 54.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu10 : 39.5%us, 5.9%sy, 0.0%ni, 54.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 34.5%us, 7.2%sy, 0.0%ni, 58.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 41.1%us, 6.9%sy, 0.0%ni, 50.3%id, 0.0%wa, 0.0%hi, 1.6%si, 0.0%st
Cpu13 : 38.0%us, 7.3%sy, 0.0%ni, 54.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 36.2%us, 6.2%sy, 0.0%ni, 57.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 36.8%us, 8.2%sy, 0.0%ni, 54.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 28817004k total, 8008372k used, 20808632k free, 705772k buffers
Swap: 30867448k total, 0k used, 30867448k free, 4848376k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2641 postgres 16 0 8524m 21m 16m S 3.3 0.1 0:00.25 postmaster
2650 postgres 15 0 8524m 21m 16m S 3.3 0.1 0:00.25 postmaster
2706 postgres 16 0 8524m 21m 15m S 3.3 0.1 0:00.20 postmaster
2814 postgres 15 0 8523m 18m 14m S 3.3 0.1 0:00.10 postmaster
2829 postgres 15 0 8523m 18m 14m S 3.3 0.1 0:00.10 postmaster
2618 postgres 15 0 8524m 21m 16m S 3.0 0.1 0:00.25 postmaster
2639 postgres 15 0 8524m 21m 16m R 3.0 0.1 0:00.25 postmaster
2671 postgres 15 0 8524m 21m 16m S 3.0 0.1 0:00.23 postmaster
2675 postgres 16 0 8524m 21m 16m S 3.0 0.1 0:00.23 postmaster
2694 postgres 15 0 8524m 21m 15m S 3.0 0.1 0:00.23 postmaster
2698 postgres 15 0 8524m 21m 15m S 3.0 0.1 0:00.21 postmaster
2702 postgres 15 0 8524m 21m 15m S 3.0 0.1 0:00.19 postmaster
2767 postgres 15 0 8524m 20m 14m S 3.0 0.1 0:00.13 postmaster
2776 postgres 15 0 8524m 20m 14m S 3.0 0.1 0:00.14 postmaster
2812 postgres 15 0 8523m 18m 14m S 3.0 0.1 0:00.11 postmaster
2819 postgres 15 0 8523m 18m 14m S 3.0 0.1 0:00.09 postmaster
2823 postgres 16 0 8523m 18m 14m S 3.0 0.1 0:00.09 postmaster
2828 postgres 15 0 8523m 18m 14m S 3.0 0.1 0:00.09 postmaster
2631 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.24 postmaster
2643 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.23 postmaster
2656 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.22 postmaster
2658 postgres 16 0 8524m 21m 15m S 2.6 0.1 0:00.22 postmaster
2664 postgres 16 0 8524m 21m 16m S 2.6 0.1 0:00.24 postmaster
2674 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.23 postmaster
2679 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.22 postmaster
2684 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.21 postmaster
2695 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2699 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2703 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2704 postgres 15 0 8524m 20m 15m R 2.6 0.1 0:00.17 postmaster
2713 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2734 postgres 15 0 8524m 20m 14m S 2.6 0.1 0:00.14 postmaster
2738 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.14 postmaster
On Mon, Jan 11, 2010 at 6:44 AM, Bob Dusek <redusek@xxxxxxxxx> wrote:What method of striped RAID? RAID-5? RAID-10? RAID-4? RAID-0?
> Hello,
>
> We're running Postgres 8.4.2 on Red Hat 5, on pretty hefty hardware...
>
> 4X E7420 Xeon, Four cores (for a total of 16 cores)
> 2.13 GHz, 8M Cache, 1066 Mhz FSB
> 32 Gigs of RAM
> 15 K RPM drives in striped raid
RAID-0
Definte "a lot".
> Things run fine, but when we get a lot of concurrent queries running, we see
> a pretty good slow down.
We have an application server that is processing requests. Each request consists of a combination of selects, inserts, and deletes. We actually see degredation when we get more than 40 concurrent requests. The exact number of queries executed by each request isn't known. It varies per request. But, an example request would be about 16 inserts and 113 selects. Any given request can't execute more than a single query at a time.
To be more specific about the degradation, we've set the "log_min_duration_statement=200", and when we run with 40 concurrent requests, we don't see queries showing up in there. When we run with 60 concurrent requests, we start seeing queries show up, and when we run 200+ requests, we see multi-second queries.
This is to be expected, to some extent, as we would expect some perfromance degradation with higher utilization. But, the hardware doesn't appear to be very busy, and that's where we're hoping for some help.
We want to have Postgres eating up as many resources as possible to chug through our queries faster. Right now, it's running slower with more utilization, but there's still too much idle time for the CPUs.
> We don't have much experience with this sort of hardware. Does anyone haveWhat do the following commands tell you?
> an example config file we could use as a good starting point for this sort
> of hardware?
>
> We have a fair amount of row-level inserts and deletes going on (probably as
> much data is inserted and deleted in a day than permanently resides in the
> db).
iostat -x 10 (first iteration doesn't count)
Here's some iostat output (from the 3rd data point from iostat -x 10)... this was taken while we were processing 256 simultaneous requests.
avg-cpu: %user %nice %system %iowait %steal %idle
34.29 0.00 7.09 0.03 0.00 58.58
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 112.20 0.00 133.40 0.00 1964.80 14.73 0.42 3.17 0.04 0.48
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda5 0.00 112.20 0.00 133.40 0.00 1964.80 14.73 0.42 3.17 0.04 0.48
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.40 0.00 3.20 8.00 0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 99.90 0.00 799.20 8.00 0.15 1.50 0.01 0.12
dm-3 0.00 0.00 0.00 0.60 0.00 4.80 8.00 0.00 0.33 0.17 0.01
dm-4 0.00 0.00 0.00 144.70 0.00 1157.60 8.00 0.46 3.17 0.02 0.35
dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The iowait seems pretty low, doesn't it?
vmstat 10 (again, first iteration doesn't count)
Here's the vmstat output, with the 6th data element clipped out, which seems representative of the whole... (also taken during 256 simultaneous request)
[root@ecpe1 pg_log]# vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
8 0 0 21818136 696692 3871172 0 0 0 1042 2261 48418 34 7 59 0 0
That's a lot of free mem, which is to be expected. Our database is not very large.
top
(taken during 256 simultaneous requests)
top - 10:11:43 up 12 days, 20:48, 6 users, load average: 10.48, 4.16, 2.83
Tasks: 798 total, 8 running, 790 sleeping, 0 stopped, 0 zombie
Cpu0 : 33.3%us, 7.6%sy, 0.0%ni, 59.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 31.6%us, 5.9%sy, 0.0%ni, 62.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 33.0%us, 6.6%sy, 0.0%ni, 60.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 35.4%us, 6.2%sy, 0.0%ni, 58.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 36.3%us, 5.6%sy, 0.0%ni, 58.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 37.4%us, 6.2%sy, 0.0%ni, 56.1%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu6 : 38.1%us, 6.0%sy, 0.0%ni, 56.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 39.2%us, 7.5%sy, 0.0%ni, 52.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu8 : 35.5%us, 7.2%sy, 0.0%ni, 56.9%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu9 : 37.8%us, 7.6%sy, 0.0%ni, 54.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu10 : 39.5%us, 5.9%sy, 0.0%ni, 54.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 34.5%us, 7.2%sy, 0.0%ni, 58.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 41.1%us, 6.9%sy, 0.0%ni, 50.3%id, 0.0%wa, 0.0%hi, 1.6%si, 0.0%st
Cpu13 : 38.0%us, 7.3%sy, 0.0%ni, 54.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 36.2%us, 6.2%sy, 0.0%ni, 57.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 36.8%us, 8.2%sy, 0.0%ni, 54.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 28817004k total, 8008372k used, 20808632k free, 705772k buffers
Swap: 30867448k total, 0k used, 30867448k free, 4848376k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2641 postgres 16 0 8524m 21m 16m S 3.3 0.1 0:00.25 postmaster
2650 postgres 15 0 8524m 21m 16m S 3.3 0.1 0:00.25 postmaster
2706 postgres 16 0 8524m 21m 15m S 3.3 0.1 0:00.20 postmaster
2814 postgres 15 0 8523m 18m 14m S 3.3 0.1 0:00.10 postmaster
2829 postgres 15 0 8523m 18m 14m S 3.3 0.1 0:00.10 postmaster
2618 postgres 15 0 8524m 21m 16m S 3.0 0.1 0:00.25 postmaster
2639 postgres 15 0 8524m 21m 16m R 3.0 0.1 0:00.25 postmaster
2671 postgres 15 0 8524m 21m 16m S 3.0 0.1 0:00.23 postmaster
2675 postgres 16 0 8524m 21m 16m S 3.0 0.1 0:00.23 postmaster
2694 postgres 15 0 8524m 21m 15m S 3.0 0.1 0:00.23 postmaster
2698 postgres 15 0 8524m 21m 15m S 3.0 0.1 0:00.21 postmaster
2702 postgres 15 0 8524m 21m 15m S 3.0 0.1 0:00.19 postmaster
2767 postgres 15 0 8524m 20m 14m S 3.0 0.1 0:00.13 postmaster
2776 postgres 15 0 8524m 20m 14m S 3.0 0.1 0:00.14 postmaster
2812 postgres 15 0 8523m 18m 14m S 3.0 0.1 0:00.11 postmaster
2819 postgres 15 0 8523m 18m 14m S 3.0 0.1 0:00.09 postmaster
2823 postgres 16 0 8523m 18m 14m S 3.0 0.1 0:00.09 postmaster
2828 postgres 15 0 8523m 18m 14m S 3.0 0.1 0:00.09 postmaster
2631 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.24 postmaster
2643 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.23 postmaster
2656 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.22 postmaster
2658 postgres 16 0 8524m 21m 15m S 2.6 0.1 0:00.22 postmaster
2664 postgres 16 0 8524m 21m 16m S 2.6 0.1 0:00.24 postmaster
2674 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.23 postmaster
2679 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.22 postmaster
2684 postgres 15 0 8524m 21m 15m S 2.6 0.1 0:00.21 postmaster
2695 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2699 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2703 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2704 postgres 15 0 8524m 20m 15m R 2.6 0.1 0:00.17 postmaster
2713 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.18 postmaster
2734 postgres 15 0 8524m 20m 14m S 2.6 0.1 0:00.14 postmaster
2738 postgres 15 0 8524m 20m 15m S 2.6 0.1 0:00.14 postmaster
What you're looking for is iowait / utilization.