Am Dienstag 27 November 2007 schrieb Scott Marlowe: > On Nov 24, 2007 10:57 AM, Peter Bauer <peter.bauer@xxxxxxxxxx> wrote: > > i have a system here with 2 2.4GHz Xeon Processors, 2GB RAM, ONE Disk on > > a Battery Backed Write Cache SCSI Controller and PostgreSQL 8.1.4 > > running with the data on a DRBD Device for High Availability. The used > > database is also replicated to two similar machines with slony1. > > Why are you running a version of PostgreSQL with known data eating > bugs? If you care for your data, you will keep up to date on releases. > 8.1.10 was released on 2007-09-17. 8.1.4 was released on 2006-05-23. > That's 16 months of bug fixes you're missing. Go here: > http://www.postgresql.org/docs/8.1/static/release.html and read up on > the fixes you're missing. Then update. Or just update. > > OK, on the the issue at hand. > > > Since the load average is between 1 (most of the time) and 10 (peeks) i > > am worried about the load and executed vmstat and iostat which show that > > 1000-6000 Blocks are writen per second. Please check the attached output > > for further details. > > top shows that the CPUs are at least 80% idle most of the time so i > > think there is an I/O bottleneck. I'm aware that this hardware setup is > > probably not sufficient but is would like to investigate how critical > > the situation is. > > Yes. Battery backed cache can only do so much, it's not magic pixie > dust. Once it's full, the drive becomes the bottle neck. Real db > servers have more than one disk drive. They usually have at least 4 > or so, and often dozens to hundreds. Also, not all battery backed > caching RAID controllers are created equal. > > > procs -----------memory---------- ---swap-- -----io---- --system-- > > ----cpu---- r b swpd free buff cache si so bi bo in > > cs us sy id wa 0 0 398256 78328 140612 1063556 0 0 0 1472 > > 2029 5081 4 3 92 0 0 2 398256 78268 140612 1063576 0 0 0 > > 2304 1928 4216 0 2 98 0 1 2 398256 78100 140612 1063576 0 0 > > 0 1464 1716 3994 1 3 96 0 0 0 398256 78704 140612 1063592 0 > > 0 0 916 1435 3570 5 3 91 0 0 0 398256 77876 140612 1063616 > > 0 0 0 0 305 1169 3 1 96 0 > > See that dip in the last line above where the blocks in drop to 0, > idle jumps to 96, and blocks out drops, and context switches drop? > That's most likely where postgresql is checkpointing. Checkpointing > is where it writes out all the dirty buffers to disk. If the bgwriter > is not tuned aggresively enough, checkpoints happen and make the whole > database slow down for a few seconds. If it's tuned too aggresively > then the db spends too much CPU time tracking the dirty buffers and > then writing them. If tuned just right, it will write out the dirty > buffers just fast enough that a checkpoint is never needed. > > You tune the bgwriter to your machine and I/O subsystem. If you're > planning on getting more hard drives, do that first. Then tune the > bgwriter. > > btw, if this is "vmstat 1" running, it's showing a checkpoint every 20 > or so seconds I think > > > 0 2 398256 79136 140612 1063964 0 0 0 1736 1959 4494 4 2 > > 94 0 > > checkpoint here: > > 0 0 398256 79132 140612 1063964 0 0 0 4 260 1039 1 1 > > 98 0 0 0 398256 79052 140612 1063980 0 0 0 2444 3084 6955 > > 6 5 89 0 0 2 398256 79060 140612 1063988 0 0 0 948 1146 > > 3616 3 1 96 0 0 1 398256 78268 140612 1064056 0 0 0 1908 > > 1809 4086 6 5 88 0 0 1 398256 76728 140612 1064056 0 0 0 > > 6256 6637 15472 5 5 90 0 0 2 398256 77000 140612 1064064 0 0 > > 0 4916 5840 12107 1 4 95 0 0 2 398256 76956 140612 1064068 0 > > 0 0 6468 7432 15211 1 3 96 0 0 6 398256 77388 140612 1064072 > > 0 0 0 8116 7826 18265 1 8 91 0 0 2 398256 74312 140612 > > 1064076 0 0 0 7032 6886 16136 2 7 91 0 0 2 398256 74264 > > 140612 1064076 0 0 0 5680 7143 13411 0 5 95 0 0 2 398256 > > 72980 140612 1064140 0 0 0 5396 6377 13251 6 6 88 0 0 3 > > 398256 76972 140612 1064148 0 0 0 5652 6793 14079 4 9 87 0 > > 0 2 398256 77836 140612 1064148 0 0 0 3968 5321 14187 10 8 > > 82 0 1 0 398256 77280 140612 1064148 0 0 0 1608 3188 8974 > > 21 12 67 0 1 0 398256 77832 140612 1064152 0 0 0 236 834 > > 2625 7 5 87 0 0 0 398256 77464 140612 1064152 0 0 0 244 > > 505 1378 2 4 94 0 1 0 398256 77828 140612 1064164 0 0 0 > > 316 580 1954 7 2 91 0 0 0 398256 77804 140612 1064180 0 0 > > 0 740 673 2248 2 2 96 0 0 0 398256 77000 140612 1064180 0 > > 0 0 304 589 1739 1 3 96 0 > > 20 rows later, checkpoint here: > > 0 0 398256 77000 140612 1064184 0 0 0 0 216 886 0 1 > > 99 0 0 0 398256 75452 140612 1064184 0 0 0 432 755 2032 > > 6 1 93 0 > > > > max_fsm_pages = 40000 # min max_fsm_relations*16, 6 > > bytes each, APUS > > This seems a little low for a busy server. > > > # - Background writer - > > > > bgwriter_delay = 100 # 10-10000 milliseconds between > > rounds, APUS bgwriter_lru_percent = 2.0 # 0-100% of LRU > > buffers scanned/round, APUS bgwriter_lru_maxpages = 10 # > > 0-1000 buffers max written/round, APUS bgwriter_all_percent = 1 > > # 0-100% of all buffers scanned/round, APUS bgwriter_all_maxpages = > > 10 # 0-1000 buffers max written/round, APUS > > So, bgwriter wakes up 10 times a second, and each time it processes 2% > of the Least Recently Used pages for writing, and writes up to 10 of > those pages. And it only checks 1% of the total pages and writes 10 > of those at the most. This is not aggresive enough, and given how > much spare CPU you have left over, you can be a fair bit more > aggresive. The main thing to increase is the maxes. Try changing > them to the 100 to 300 range, and maybe increase your percentages to > 5% or so. What we're shooting for is to see those checkpoints go > away. > > Then, when running your benchmark, after a few minutes, run a > checkpoint by hand and see if you get one of those slow downs like we > saw in vmstat above. If your bgwriter is tuned properly, you should > get an almost instant response from the checkpoint and no noticeable > slow down in the vmstat numbers for context switches per second. > > Once you reach the point where the bgwriter is just keeping ahead of > check points, there's little to be gained in more aggressive tuning of > the bgwriter and you'll just be chewing up memory and cpu bandwidth if > you do get too aggressive with it. Hi Scott, thank you for the great suggestions, i will keep the list informed. br, Peter -- Peter Bauer APUS Software G.m.b.H. A-8074 Raaba, Bahnhofstrasse 1/1 Email: peter.bauer@xxxxxxxxxx Tel: +43 316 401629 24 Fax: +43 316 401629 9 ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match