Re: Incredibly slow restore times after 9.0>9.2 upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tomas Vondra wrote
> On 29.10.2014 16:12, jmcdonagh wrote:
>> Hi Tomas- thank you for your thoughtful response!
>> 
>> 
>> Tomas Vondra wrote
>>> On 28.10.2014 21:55, jmcdonagh wrote:
>>>> Hi, we have a nightly job that restores current production data to
>>>> the development databases in a 'warm spare' database so that if the
>>>> developers need fresh data, it's ready during the day. When we moved
>>>> from 9.0 to 9.2 suddenly the restores began to take from a few hours
>>>> to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS
>>>> volumes, warmed them up, threw IOPS at them, pretty much all the
>>>> standard stuff to get more disk performance.
>>>
>>> So, if I understand it correctly, you've been restoring into 9.0, then
>>> you switched to 9.2 and it's much slower?
>> 
>> Yes- but since the move was done utilizing snapshots so the move
>> involves new volumes, but I have created new volumes since then to
>> rule out a single bad volume.
> 
> My advice would be to do some basic low-level performance tests to rule
> this out. Use dd or (better) fio to test basic I/O performance, it's
> much easier to spot issues that way.

I've done dd tests and the volumes perform fine. 


Tomas Vondra wrote
>> Tomas Vondra wrote
>>> Is the 9.2 configured equally to 9.0? If you do something like this
>>>
>>>   SELECT name, setting
>>>     FROM pg_settings
>>>    WHERE source = 'configuration file';
>>>
>>> on both versions, what do you get?
>> 
>> I no longer have the 9.0 box up but we do track configuration via
>> puppet and git. The only configuration change made for 9.2 is:
>> 
>> -#standard_conforming_strings = off
>> +standard_conforming_strings = off
> 
> Compared to 9.0, I suppose? Anyway, post the non-default config values
> at least for 9.2, please.

Yea, so in comparison to the only change was that. Here are the non-default
settings (some of them are probably defaults, but these are the uncommented
lines from postgresql.conf):

data_directory = '/mnt/postgresql/9.2/main'      # use data in another
directory
hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'   # host-based
authentication file
ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'   # ident
configuration file
external_pid_file = '/var/run/postgresql/9.2-main.pid'      # write an extra
PID file
listen_addresses = '*'        # what IP address(es) to listen on;
port = 5432            # (change requires restart)
max_connections = 300         # (change requires restart)
unix_socket_directory = '/var/run/postgresql'      # (change requires
restart)
ssl = true            # (change requires restart)
shared_buffers = 4GB         # min 128kB
temp_buffers = 128MB         # min 800kB
work_mem = 256MB            # min 64kB
maintenance_work_mem = 256MB      # min 1MB
wal_buffers = 512kB         # min 32kB
commit_delay = 50000         # range 0-100000, in microseconds
commit_siblings = 1         # range 1-1000
random_page_cost = 2.0         # same scale as above
effective_cache_size = 16GB
from_collapse_limit = 10
join_collapse_limit = 10      # 1 disables collapsing of explicit
log_destination = 'stderr'      # Valid values are combinations of
client_min_messages = warning      # values in order of decreasing detail:
log_min_messages = warning      # values in order of decreasing detail:
log_min_duration_statement = 1000   # -1 is disabled, 0 logs all statements
log_line_prefix = '%t '         # special values:
autovacuum = on         # Enable autovacuum subprocess?  'on'
datestyle = 'iso, mdy'
timezone = EST5EDT # actually, defaults to TZ environment
client_encoding = sql_ascii      # actually, defaults to database
lc_messages = 'en_US.UTF-8'         # locale for system error message
lc_monetary = 'en_US.UTF-8'         # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'         # locale for number formatting
lc_time = 'en_US.UTF-8'            # locale for time formatting
default_text_search_config = 'pg_catalog.english'
standard_conforming_strings = off


Tomas Vondra wrote
>> Tomas Vondra wrote
>>>> Here's the thing, the disk isn't saturated. The behavior I'm seeing 
>>>> seems very odd to me; I'm seeing the source disk which holds the dump
>>>> saturated by reads, which is great, but then I just see nothing being
>>>> written to the postgres volume. Just nothing happening, then a
>>>> small burst. There is no write queue backup on the destination disk
>>>> either. if I look at pg_stat_activity I'll see something like:
>>>>
>>>> COPY salesforce_reconciliation (salesforce_id, email,
>>>> advisor_salesforce_id, processed) FROM stdin
>>>>
>>>> and even for small tables, that seems to take a very long time even
>>>> though the destination disk is almost at 0 utilization.
>>>
>>> So, where's the bottleneck? Clearly, there's one, so is it a CPU, a
>>> disk or something else? Or maybe network, because you're using EBS?
>>>
>>> What do you mean by 'utilization'? How do you measure that?
>> 
>> The bottleneck is I/O somehow. I say somehow, because I see iowait 
>> averaging about 50% between two CPUs, but there is just no writes to 
>> the destination EBS volume really happening, just reads from the
>> disk where the source dump is located, then bursts of writes to the
>> destination volume every so often. It's kind of puzzling. This is
>> happening on multiple database servers, in multiple availability
>> zones. Driving me bonkers.
>> 
>> What I mean by utilization is util% from iostat -m -x 1.
> 
> I find this rather contradictory. At one moment you say the disk isn't
> saturated, the next moment you say you're I/O bound.
> 
> Also, iowait (as reported e.g. by 'top') is tricky to interpret
> correctly, especially on multi-cpu systems (nice intro to the complexity
> [1]). It's really difficult to interpret the 50% iowait without more
> info about what's happening on the machine.
> 
> IMHO, the utilization (as reported by iotop) is much easier to
> interpret, because it means '% of time the device was servicing
> requests'. It has issues too, because 100% does not mean 'saturated'
> (especially on RAID arrays that can service multiple requests in
> parallel), but it's better than iowait.
> 
> If I had to guess based from your info, I'd bet you're CPU bound, so
> there's very little idle time and about 50% of it is spent waiting for
> I/O requests (hence the 50% iowait). But in total the amount of I/O is
> very small, so %util is ~0.
> 
> Please, post a few lines of 'iostat -x -k 1' output. Samples from 'top'
> and 'vmstat 1' would be handy too.
> 
> regards
> Tomas

Well I'm confused too by this whole thing which is why I came here. I can
gather those statistics but I have a quick short question, could this be
caused by frivolous indexes or something like that?



--
View this message in context: http://postgresql.1045698.n5.nabble.com/Incredibly-slow-restore-times-after-9-0-9-2-upgrade-tp5824701p5825657.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux