Re: Incredibly slow restore times after 9.0>9.2 upgrade

jmcdonagh <Joseph.E.McDonagh@xxxxxxxxx> · Wed, 29 Oct 2014 08:12:23 -0700 (PDT)

Hi Tomas- thank you for your thoughtful response!

Tomas Vondra wrote
> On 28.10.2014 21:55, jmcdonagh wrote:
>> Hi, we have a nightly job that restores current production data to
>> the development databases in a 'warm spare' database so that if the
>> developers need fresh data, it's ready during the day. When we moved
>> from 9.0 to 9.2 suddenly the restores began to take from a few hours
>> to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS
>> volumes, warmed them up, threw IOPS at them, pretty much all the
>> standard stuff to get more disk performance.
> 
> So, if I understand it correctly, you've been restoring into 9.0, then
> you switched to 9.2 and it's much slower?

Yes- but since the move was done utilizing snapshots so the move involves
new volumes, but I have created new volumes since then to rule out a single
bad volume.

Tomas Vondra wrote
> Is the 9.2 configured equally to 9.0? If you do something like this
> 
>   SELECT name, setting
>     FROM pg_settings
>    WHERE source = 'configuration file';
> 
> on both versions, what do you get?

I no longer have the 9.0 box up but we do track configuration via puppet and
git. The only configuration change made for 9.2 is:

-#standard_conforming_strings = off
+standard_conforming_strings = off

Cause we have an old app that needs this setting on otherwise we'd spend a
lot of time trying to fix it.

Tomas Vondra wrote
>> Here's the thing, the disk isn't saturated. The behavior I'm seeing 
>> seems very odd to me; I'm seeing the source disk which holds the dump
>> saturated by reads, which is great, but then I just see nothing being
>> written to the postgres volume. Just nothing happening, then a
>> small burst. There is no write queue backup on the destination disk
>> either. if I look at pg_stat_activity I'll see something like:
>> 
>> COPY salesforce_reconciliation (salesforce_id, email,
>> advisor_salesforce_id, processed) FROM stdin
>> 
>> and even for small tables, that seems to take a very long time even
>> though the destination disk is almost at 0 utilization.
> 
> So, where's the bottleneck? Clearly, there's one, so is it a CPU, a disk
> or something else? Or maybe network, because you're using EBS?
> 
> What do you mean by 'utilization'? How do you measure that?

The bottleneck is I/O somehow. I say somehow, because I see iowait averaging
about 50% between two CPUs, but there is just no writes to the destination
EBS volume really happening, just reads from the disk where the source dump
is located, then bursts of writes to the destination volume every so often.
It's kind of puzzling. This is happening on multiple database servers, in
multiple availability zones. Driving me bonkers.

What I mean by utilization is util% from iostat -m -x 1.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Incredibly-slow-restore-times-after-9-0-9-2-upgrade-tp5824701p5824847.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance