On Sat, 16 Feb 2008, Phoenix Kiula wrote:
The script you suggested doesn't work: tmp > ./trackusage.sh -bash: ./trackusage.sh: /bin/sh: bad interpreter: Permission denied
Try changing the first line to #!/bin/bash
Anyway, I did the vmstat command. I was running it while the system was ok, then not ok, then ok...and so on. So I hope these numbers have captured what the issue is: tmp > vmstat 10 60 procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 3380 323140 115248 2995560 0 0 0 60 1097 153 1 0 97 2 0 0 3380 280260 115272 2995536 0 0 0 77 1087 133 1 0 98 1 0 0 3380 200580 115296 2995512 0 0 2 65 1089 140 1 0 97 2 0 0 3380 81916 115392 2995676 0 0 17 82 1089 188 2 1 94 2 0 0 3380 16980 98072 2974256 0 0 48 122 1102 190 2 1 95 3 1 0 3380 21588 73160 2954708 0 0 86 274 1128 276 2 2 88 8 0 0 3380 52692 57860 2932048 0 0 1 128 1106 211 2 1 95 3 0 0 3380 184748 57960 2931948 0 0 6 219 1128 451 2 1 92 5 0 0 3380 342996 58016 2931892 0 0 0 140 1122 465 2 1 94 3
Looks like the worst spot was in the middle here. Something gobbled up over 300MB of memory in 40 seconds, enough to force the OS to blow away almost half its disk buffers just to keep working memory free. Not so bad that it went to swap or invoked the OOM killer but enough to push the I/O block out (bo) up. I would guess the other ugly spots were the later portions where the bo spiked >100.
But without knowing more about what the processing using this memory and generating the output I/O are doing it's hard to say why. That's why I suggested you watch top with the command lines turned on for a bit, to see what process(es) are jumping around during the bad periods.
-- * Greg Smith gsmith@xxxxxxxxxxxxx http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match