Tomas Vondra wrote > On 29.10.2014 16:12, jmcdonagh wrote: >> Hi Tomas- thank you for your thoughtful response! >> >> >> Tomas Vondra wrote >>> On 28.10.2014 21:55, jmcdonagh wrote: >>>> Hi, we have a nightly job that restores current production data to >>>> the development databases in a 'warm spare' database so that if the >>>> developers need fresh data, it's ready during the day. When we moved >>>> from 9.0 to 9.2 suddenly the restores began to take from a few hours >>>> to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS >>>> volumes, warmed them up, threw IOPS at them, pretty much all the >>>> standard stuff to get more disk performance. >>> >>> So, if I understand it correctly, you've been restoring into 9.0, then >>> you switched to 9.2 and it's much slower? >> >> Yes- but since the move was done utilizing snapshots so the move >> involves new volumes, but I have created new volumes since then to >> rule out a single bad volume. > > My advice would be to do some basic low-level performance tests to rule > this out. Use dd or (better) fio to test basic I/O performance, it's > much easier to spot issues that way. I've done dd tests and the volumes perform fine. Tomas Vondra wrote >> Tomas Vondra wrote >>> Is the 9.2 configured equally to 9.0? If you do something like this >>> >>> SELECT name, setting >>> FROM pg_settings >>> WHERE source = 'configuration file'; >>> >>> on both versions, what do you get? >> >> I no longer have the 9.0 box up but we do track configuration via >> puppet and git. The only configuration change made for 9.2 is: >> >> -#standard_conforming_strings = off >> +standard_conforming_strings = off > > Compared to 9.0, I suppose? Anyway, post the non-default config values > at least for 9.2, please. Yea, so in comparison to the only change was that. Here are the non-default settings (some of them are probably defaults, but these are the uncommented lines from postgresql.conf): data_directory = '/mnt/postgresql/9.2/main' # use data in another directory hba_file = '/etc/postgresql/9.2/main/pg_hba.conf' # host-based authentication file ident_file = '/etc/postgresql/9.2/main/pg_ident.conf' # ident configuration file external_pid_file = '/var/run/postgresql/9.2-main.pid' # write an extra PID file listen_addresses = '*' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 300 # (change requires restart) unix_socket_directory = '/var/run/postgresql' # (change requires restart) ssl = true # (change requires restart) shared_buffers = 4GB # min 128kB temp_buffers = 128MB # min 800kB work_mem = 256MB # min 64kB maintenance_work_mem = 256MB # min 1MB wal_buffers = 512kB # min 32kB commit_delay = 50000 # range 0-100000, in microseconds commit_siblings = 1 # range 1-1000 random_page_cost = 2.0 # same scale as above effective_cache_size = 16GB from_collapse_limit = 10 join_collapse_limit = 10 # 1 disables collapsing of explicit log_destination = 'stderr' # Valid values are combinations of client_min_messages = warning # values in order of decreasing detail: log_min_messages = warning # values in order of decreasing detail: log_min_duration_statement = 1000 # -1 is disabled, 0 logs all statements log_line_prefix = '%t ' # special values: autovacuum = on # Enable autovacuum subprocess? 'on' datestyle = 'iso, mdy' timezone = EST5EDT # actually, defaults to TZ environment client_encoding = sql_ascii # actually, defaults to database lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting default_text_search_config = 'pg_catalog.english' standard_conforming_strings = off Tomas Vondra wrote >> Tomas Vondra wrote >>>> Here's the thing, the disk isn't saturated. The behavior I'm seeing >>>> seems very odd to me; I'm seeing the source disk which holds the dump >>>> saturated by reads, which is great, but then I just see nothing being >>>> written to the postgres volume. Just nothing happening, then a >>>> small burst. There is no write queue backup on the destination disk >>>> either. if I look at pg_stat_activity I'll see something like: >>>> >>>> COPY salesforce_reconciliation (salesforce_id, email, >>>> advisor_salesforce_id, processed) FROM stdin >>>> >>>> and even for small tables, that seems to take a very long time even >>>> though the destination disk is almost at 0 utilization. >>> >>> So, where's the bottleneck? Clearly, there's one, so is it a CPU, a >>> disk or something else? Or maybe network, because you're using EBS? >>> >>> What do you mean by 'utilization'? How do you measure that? >> >> The bottleneck is I/O somehow. I say somehow, because I see iowait >> averaging about 50% between two CPUs, but there is just no writes to >> the destination EBS volume really happening, just reads from the >> disk where the source dump is located, then bursts of writes to the >> destination volume every so often. It's kind of puzzling. This is >> happening on multiple database servers, in multiple availability >> zones. Driving me bonkers. >> >> What I mean by utilization is util% from iostat -m -x 1. > > I find this rather contradictory. At one moment you say the disk isn't > saturated, the next moment you say you're I/O bound. > > Also, iowait (as reported e.g. by 'top') is tricky to interpret > correctly, especially on multi-cpu systems (nice intro to the complexity > [1]). It's really difficult to interpret the 50% iowait without more > info about what's happening on the machine. > > IMHO, the utilization (as reported by iotop) is much easier to > interpret, because it means '% of time the device was servicing > requests'. It has issues too, because 100% does not mean 'saturated' > (especially on RAID arrays that can service multiple requests in > parallel), but it's better than iowait. > > If I had to guess based from your info, I'd bet you're CPU bound, so > there's very little idle time and about 50% of it is spent waiting for > I/O requests (hence the 50% iowait). But in total the amount of I/O is > very small, so %util is ~0. > > Please, post a few lines of 'iostat -x -k 1' output. Samples from 'top' > and 'vmstat 1' would be handy too. > > regards > Tomas Well I'm confused too by this whole thing which is why I came here. I can gather those statistics but I have a quick short question, could this be caused by frivolous indexes or something like that? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Incredibly-slow-restore-times-after-9-0-9-2-upgrade-tp5824701p5825657.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance