2009/5/29 Scott Carey <scott@xxxxxxxxxxxxxxxxx>
One of the server is: Intel Xeon X7350 2.93GHz, RH 5.3 and kernel 2.6.18-128.el5.
and the perfonmace is bad too, so i don't think the probles is the kernel
The two servers that I tested (HP-785 Opteron and IBM x3950 M2 Xeon) have NUMA architecture. and I thought the problem was caused by NUMA.
http://archives.postgresql.org/pgsql-admin/2008-11/msg00157.php
I'm trying another server, an HP blade bl 680 with Xeon E7450 (4 CPU x 6 cores= 24 cores) without NUMA architecture, but the CPUs are also going up.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 46949972 116908 17032964 0 0 15 31 2 2 1 0 98 0 0
2 0 0 46945880 116916 17033068 0 0 72 140 2059 3140 1 1 97 0 0
329 0 0 46953260 116932 17033208 0 0 24 612 1435 194237 44 3 53 0 0
546 0 0 46952912 116940 17033208 0 0 4 136 1090 327047 96 4 0 0 0
562 0 0 46951052 116940 17033224 0 0 0 0 1095 323034 95 4 0 0 0
514 0 0 46949200 116952 17033212 0 0 0 224 1088 330178 96 3 1 0 0
234 0 0 46948456 116952 17033212 0 0 0 0 1106 315359 91 5 4 0 0
4 0 0 46958376 116968 17033272 0 0 16 396 1379 223499 47 3 49 0 0
1 1 0 46941644 116976 17033224 0 0 152 1140 2662 5540 4 2 93 1 0
1 0 0 46943196 116984 17033248 0 0 104 604 2307 3992 4 2 94 0 0
1 1 0 46931544 116996 17033568 0 0 104 4304 2318 3585 1 1 97 1 0
0 0 0 46943572 117004 17033568 0 0 32 204 2007 2986 1 1 98 0 0
Now i don't think the probles is NUMA.
The developer team will fix de aplication and then i will test again.
I believe that when the application closes the connection the problem could be solved, and then 16 cores in a server does the work instead of a 32 or 24.
Regards...
--Fabrix
Comparing RedHat's 2.6.18, heavily patched, fix backported kernel to the
On 5/28/09 6:54 PM, "Greg Smith" <gsmith@xxxxxxxxxxxxx> wrote:
> 2) You have very new hardware and a very old kernel. Once you've done the
> above, if you're still not happy with performance, at that point you
> should consider using a newer one. It's fairly simple to build a Linux
> kernel using the same basic kernel parameters as the stock RedHat one.
> 2.6.28 is six months old now, is up to 2.6.28.10, and has gotten a lot
> more testing than most kernels due to it being the Ubuntu 9.04 default.
> I'd suggest you try out that version.
original 2.6.18 is really hard. Yes, much of it is old, but a lot of stuff
has been backported.
I have no idea if things related to this case have been backported. Virtual
memory management is complex and only bug fixes would likely go in however.
But RedHat 5.3 for example put all the new features for Intel's latest
processor in the release (which may not even be in 2.6.28!).
There are operations/IT people won't touch Ubuntu etc with a ten foot pole
yet for production. That may be irrational, but such paranoia exists. The
latest postgres release is generally a hell of a lot safer than the latest
linux kernel, and people get paranoid about their DB.
If you told someone who has to wake up at 3AM by page if the system has an
error that "oh, we patched our own kenrel build into the RedHat OS" they
might not be ok with that.
Its a good test to see if this problem is fixed in the kernel. I've seen
CentOS 5.2 go completely nuts with system CPU time and context switches with
kswapd many times before. I haven't put the system under the same stress
with 5.3 yet however.
One of the server is: Intel Xeon X7350 2.93GHz, RH 5.3 and kernel 2.6.18-128.el5.
and the perfonmace is bad too, so i don't think the probles is the kernel
The two servers that I tested (HP-785 Opteron and IBM x3950 M2 Xeon) have NUMA architecture. and I thought the problem was caused by NUMA.
http://archives.postgresql.org/pgsql-admin/2008-11/msg00157.php
I'm trying another server, an HP blade bl 680 with Xeon E7450 (4 CPU x 6 cores= 24 cores) without NUMA architecture, but the CPUs are also going up.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 46949972 116908 17032964 0 0 15 31 2 2 1 0 98 0 0
2 0 0 46945880 116916 17033068 0 0 72 140 2059 3140 1 1 97 0 0
329 0 0 46953260 116932 17033208 0 0 24 612 1435 194237 44 3 53 0 0
546 0 0 46952912 116940 17033208 0 0 4 136 1090 327047 96 4 0 0 0
562 0 0 46951052 116940 17033224 0 0 0 0 1095 323034 95 4 0 0 0
514 0 0 46949200 116952 17033212 0 0 0 224 1088 330178 96 3 1 0 0
234 0 0 46948456 116952 17033212 0 0 0 0 1106 315359 91 5 4 0 0
4 0 0 46958376 116968 17033272 0 0 16 396 1379 223499 47 3 49 0 0
1 1 0 46941644 116976 17033224 0 0 152 1140 2662 5540 4 2 93 1 0
1 0 0 46943196 116984 17033248 0 0 104 604 2307 3992 4 2 94 0 0
1 1 0 46931544 116996 17033568 0 0 104 4304 2318 3585 1 1 97 1 0
0 0 0 46943572 117004 17033568 0 0 32 204 2007 2986 1 1 98 0 0
Now i don't think the probles is NUMA.
The developer team will fix de aplication and then i will test again.
I believe that when the application closes the connection the problem could be solved, and then 16 cores in a server does the work instead of a 32 or 24.
Regards...
--Fabrix