First of all, Thanks for response, answers below.
On Mon, Jul 15, 2013 at 4:12 PM, Kevin Grittner <kgrittn@xxxxxxxxx> wrote:
Rafael Domiciano <rafael.domiciano@xxxxxxxxx> wrote:> CentOS release 6.3 (Final)
> PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc
> (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit
During these episodes, do you see high system CPU time? If so, try
> Since 2 weeks I'm get stucked in a very strange situation: from
> time to time (sometimes with intervals less than 10 minutes), the
> server get "stucked"/"hang" (I dont know how to call it) and
> every connections on postgres (dont matter if it's SELECT,
> UPDATE, DELETE, INSERT, startup, authentication...) seems like
> get "paused"; after some seconds (say ~10 or ~15 sec, sometimes
> less) everything "goes OK".
disabling transparent huge page support, and see whether it affects
the frequency or severity of the episodes.
Well, running mpstat 1 give me the following:
08:27:48 all 5,44 0,00 3,97 0,59 0,03 0,03 0,00 0,00 89,93
08:27:49 all 7,61 0,00 3,22 3,13 0,00 0,06 0,00 0,00 85,97
08:27:50 all 2,54 0,00 24,23 0,06 0,00 0,00 0,00 0,00 73,17
08:27:51 all 1,76 0,00 33,33 0,19 0,00 0,00 0,00 0,00 64,72
08:27:51 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
08:27:52 all 5,07 0,00 23,63 4,10 0,00 0,06 0,00 0,00 67,14
08:27:53 all 0,34 0,00 17,89 3,07 0,00 0,00 0,00 0,00 78,70
08:27:54 all 0,06 0,00 14,94 0,03 0,00 0,03 0,00 0,00 84,93
08:27:55 all 4,64 0,00 4,64 3,41 0,03 0,09 0,00 0,00 87,19
08:27:56 all 9,27 0,00 2,29 3,76 0,03 0,03 0,00 0,00 84,62
08:27:57 all 3,32 0,00 15,49 1,82 0,00 0,03 0,00 0,00 79,34
08:27:58 all 0,09 0,00 16,67 0,31 0,00 0,00 0,00 0,00 82,92
Another sample:
[###@###~]# mpstat 1
Linux 2.6.32-279.14.1.el6.x86_64 (###.###) 16-07-2013 _x86_64_ (32 CPU)
08:37:50 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
08:37:51 all 4,85 0,00 15,37 1,60 0,00 0,03 0,00 0,00 78,15
08:37:52 all 4,70 0,00 22,89 0,06 0,00 0,03 0,00 0,00 72,32
08:37:53 all 0,97 0,00 21,55 0,03 0,00 0,00 0,00 0,00 77,45
08:37:54 all 0,53 0,00 19,54 0,03 0,00 0,00 0,00 0,00 79,89
08:37:55 all 0,19 0,00 13,24 0,91 0,03 0,06 0,00 0,00 85,57
08:37:56 all 6,56 0,00 1,91 7,00 0,00 0,16 0,00 0,00 84,37
08:37:57 all 3,72 0,00 0,47 6,29 0,00 0,00 0,00 0,00 89,51
08:37:58 all 5,35 0,00 0,66 3,79 0,00 0,03 0,00 0,00 90,17
Yeah, disabling THP seens to lower the severity of the situation. Thanks. Right now is about 1 hour without any episode.
Same problem here and same resolution: http://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit.
Same problem here and same resolution: http://dba.stackexchange.com/questions/32890/postgresql-pg-stat-activity-shows-commit.
Googling I've found that others had the same problem, and resolved disabling THP. Is it the right way?
About the disks activity, my parameter is the test that was did when the storage was installed/configured. At that test iostat was around ~600 tps. In my episodes tps was around ~300 tps.
The Processors is 2x Intel Xeon E5-2690, giving a total of 32 threads.
The Processors is 2x Intel Xeon E5-2690, giving a total of 32 threads.
About shared_buffers, I going to try different values and test.
Thanks,
Rafael Domiciano
> So, my first trial was to check disks. Running "iostat"Did you run iostat during an episode of slowness? What did it
> apparently showed that disks was OK.
show? Giving an interpretation that it as "apparently OK" doesn't
provide much useful information.
Are there any reports to show you when writing was saturated?
> It's a Raid10, 4 600GB SAS, IBM Storage DS3512, over FC. IBM DS
> Storage Manager says that disks is OK.
> total used free shared buffers cached
> Mem: 145182 130977 14204 0 43 121407
> -/+ buffers/cache: 9526 135655
> Swap: 6143 65 6078
> Following is what I've tried:On a machine with nearly twice that RAM, I've had to decrease
> 1) Emre Hasegeli has suggested to reduce my shared buffers, but
> it's already low:
> total server memory: 141 GB
> shared_buffers: 16 GB
shared_buffers to 2GB to avoid the symptoms you describe. That is
in conjunction with making the background writer more aggressive
and making sure the checkpoint completion target is set to 0.9.
Well, you could try that; if the symptoms get worse, then you might
> Maybe it's too low? I've been thinking to increase to 32 GB.
be willing to go the other direction....
How many cores (not "hardware threads") does the machine have? You
> max_connections = 500 and ~400 connections average
will probably have better throughput and latency if you use
connection pooling to limit the number of active database
transactions to somewhere arount two times the number of cores, or
slightly above that.
--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company