Re: REINDEX takes half a day (and still not complete!)

Phoenix Kiula <phoenix.kiula@xxxxxxxxx> · Mon, 18 Apr 2011 13:14:34 +0800

Thanks for these suggestions.

I am beginning to wonder if the issue is deeper.

I set autovacuum to off, then turned off all the connections to the
database, and did a manual vacuum just to see how long it takes.

This was last night my time. I woke up this morning and it has still
not finished.

The maintenance_men given to the DB for this process was 2GB.

There is nothing else going on on the server! Now, even REINDEX is
just failing in the middle:

# REINDEX INDEX new_idx_userid;
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

What else could be wrong?

On Mon, Apr 18, 2011 at 2:38 AM, Scott Marlowe <scott.marlowe@xxxxxxxxx> wrote:
> On Sun, Apr 17, 2011 at 10:59 AM, Phoenix <phoenix.kiula@xxxxxxxxx> wrote:
>> TOP does not show much beyond "postmaster". How should I use TOP and
>> what info can I give you? This is what it looks like:
>
> We're basically looking to see if the postmaster process doing the
> vacuuming or reindexing is stuck in a D state, which means it's
> waiting on IO.
> hot the c key while it's running and you should get a little more info
> on which processes are what.
>
>>  4799 postgres  15   0  532m  94m  93m D  0.7  1.2   0:00.14
>> postmaster
>
> That is likely the postmaster that is waiting on IO.
>
>> VMSTAT 10 shows this:
>>
>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>>  3 14  99552  17900  41108 7201712    0    0    42    11    0     0  8 34 41 16
>>  2 17  99552  16468  41628 7203012    0    0  1326    84 1437 154810  7 66 12 15
>>  3  7  99476  16796  41056 7198976    0    0  1398    96 1453 156211  7 66 21  6
>>  3 17  99476  17228  39132 7177240    0    0  1325    68 1529 156111  8 65 16 11
>
> So, we're at 11 to 15% io wait.  I'm gonna guess you have 8 cores /
> threads in your CPUs, and 1/8th ot 100% is 12% so looks like you're
> probably IO bound here.  iostat tells us more:
>
>> The results of "iostat -xd 10" is:
>> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
>> avgrq-sz avgqu-sz   await  svctm  %util
>> sda          0.00   7.41  0.30  3.50    2.40   87.29     1.20    43.64
>>   23.58     0.13   32.92  10.03   3.81
>> sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00
>>    0.00     0.00    0.00   0.00   0.00
>> sdc          0.00  18.32 158.26  4.10 2519.32  180.98  1259.66
>> 90.49    16.63    13.04   79.91   6.17 100.11
>
> 100% IO utilization, so yea, it's likely that your sdc drive is your
> bottleneck.  Given our little data is actually moving through the sdc
> drive, it's not very fast.
>
>> Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
>
>> 8GB memory in total. 1GB devoted to PGSQL during these operations.
>> Otherwise, my settings are as follows (and yes I did make the vacuum
>> settings more aggressive based on your email, which has had no
>> apparent impact) --
>
> Yeah, as it gets more aggressive it can use more of your IO bandwidth.
>  Since you
>
>> What else can I share?
>
> That's a lot of help.  I'm assuming you're running software or
> motherboard fake-raid on this RAID-1 set?  I'd suggest buying a $500
> or so battery backed caching RAID controller first,  the improvements
> in performance are huge with such a card.  You might wanna try testing
> the current RAID-1 set with bonnie++ to get an idea of how fast it is.
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance