Btw, hardware is not an issue. My db has been working fine for a while. Smaller poorer systems around the web run InnoDB databases. I wouldn't touch that with a barge pole. I have a hardware RAID controller, not "fake". It's a good quality battery-backed 3Ware: http://192.19.193.26/products/serial_ata2-9000.asp On Mon, Apr 18, 2011 at 1:14 PM, Phoenix Kiula <phoenix.kiula@xxxxxxxxx> wrote: > Thanks for these suggestions. > > I am beginning to wonder if the issue is deeper. > > I set autovacuum to off, then turned off all the connections to the > database, and did a manual vacuum just to see how long it takes. > > This was last night my time. I woke up this morning and it has still > not finished. > > The maintenance_men given to the DB for this process was 2GB. > > There is nothing else going on on the server! Now, even REINDEX is > just failing in the middle: > > > # REINDEX INDEX new_idx_userid; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > > What else could be wrong? > > > > > On Mon, Apr 18, 2011 at 2:38 AM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote: >> On Sun, Apr 17, 2011 at 10:59 AM, Phoenix <phoenix.kiula@xxxxxxxxx> wrote: >>> TOP does not show much beyond "postmaster". How should I use TOP and >>> what info can I give you? This is what it looks like: >> >> We're basically looking to see if the postmaster process doing the >> vacuuming or reindexing is stuck in a D state, which means it's >> waiting on IO. >> hot the c key while it's running and you should get a little more info >> on which processes are what. >> >>> 4799 postgres 15 0 532m 94m 93m D 0.7 1.2 0:00.14 >>> postmaster >> >> That is likely the postmaster that is waiting on IO. >> >>> VMSTAT 10 shows this: >>> >>> r b swpd free buff cache si so bi bo in cs us sy id wa >>> 3 14 99552 17900 41108 7201712 0 0 42 11 0 0 8 34 41 16 >>> 2 17 99552 16468 41628 7203012 0 0 1326 84 1437 154810 7 66 12 15 >>> 3 7 99476 16796 41056 7198976 0 0 1398 96 1453 156211 7 66 21 6 >>> 3 17 99476 17228 39132 7177240 0 0 1325 68 1529 156111 8 65 16 11 >> >> So, we're at 11 to 15% io wait. I'm gonna guess you have 8 cores / >> threads in your CPUs, and 1/8th ot 100% is 12% so looks like you're >> probably IO bound here. iostat tells us more: >> >>> The results of "iostat -xd 10" is: >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s >>> avgrq-sz avgqu-sz await svctm %util >>> sda 0.00 7.41 0.30 3.50 2.40 87.29 1.20 43.64 >>> 23.58 0.13 32.92 10.03 3.81 >>> sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >>> 0.00 0.00 0.00 0.00 0.00 >>> sdc 0.00 18.32 158.26 4.10 2519.32 180.98 1259.66 >>> 90.49 16.63 13.04 79.91 6.17 100.11 >> >> 100% IO utilization, so yea, it's likely that your sdc drive is your >> bottleneck. Given our little data is actually moving through the sdc >> drive, it's not very fast. >> >>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s >> >>> 8GB memory in total. 1GB devoted to PGSQL during these operations. >>> Otherwise, my settings are as follows (and yes I did make the vacuum >>> settings more aggressive based on your email, which has had no >>> apparent impact) -- >> >> Yeah, as it gets more aggressive it can use more of your IO bandwidth. >> Since you >> >>> What else can I share? >> >> That's a lot of help. I'm assuming you're running software or >> motherboard fake-raid on this RAID-1 set? I'd suggest buying a $500 >> or so battery backed caching RAID controller first, the improvements >> in performance are huge with such a card. You might wanna try testing >> the current RAID-1 set with bonnie++ to get an idea of how fast it is. >> > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance