On 29.10.2014 16:12, jmcdonagh wrote: > Hi Tomas- thank you for your thoughtful response! > > > Tomas Vondra wrote >> On 28.10.2014 21:55, jmcdonagh wrote: >>> Hi, we have a nightly job that restores current production data to >>> the development databases in a 'warm spare' database so that if the >>> developers need fresh data, it's ready during the day. When we moved >>> from 9.0 to 9.2 suddenly the restores began to take from a few hours >>> to more like 15 hours or so. We're in Amazon EC2, I've tried new EBS >>> volumes, warmed them up, threw IOPS at them, pretty much all the >>> standard stuff to get more disk performance. >> >> So, if I understand it correctly, you've been restoring into 9.0, then >> you switched to 9.2 and it's much slower? > > Yes- but since the move was done utilizing snapshots so the move > involves new volumes, but I have created new volumes since then to > rule out a single bad volume. My advice would be to do some basic low-level performance tests to rule this out. Use dd or (better) fio to test basic I/O performance, it's much easier to spot issues that way. > Tomas Vondra wrote >> Is the 9.2 configured equally to 9.0? If you do something like this >> >> SELECT name, setting >> FROM pg_settings >> WHERE source = 'configuration file'; >> >> on both versions, what do you get? > > I no longer have the 9.0 box up but we do track configuration via > puppet and git. The only configuration change made for 9.2 is: > > -#standard_conforming_strings = off > +standard_conforming_strings = off Compared to 9.0, I suppose? Anyway, post the non-default config values at least for 9.2, please. > Cause we have an old app that needs this setting on otherwise we'd > spend a lot of time trying to fix it. I doubt standard_conforming_strings has anything to do with the issues. > Tomas Vondra wrote >>> Here's the thing, the disk isn't saturated. The behavior I'm seeing >>> seems very odd to me; I'm seeing the source disk which holds the dump >>> saturated by reads, which is great, but then I just see nothing being >>> written to the postgres volume. Just nothing happening, then a >>> small burst. There is no write queue backup on the destination disk >>> either. if I look at pg_stat_activity I'll see something like: >>> >>> COPY salesforce_reconciliation (salesforce_id, email, >>> advisor_salesforce_id, processed) FROM stdin >>> >>> and even for small tables, that seems to take a very long time even >>> though the destination disk is almost at 0 utilization. >> >> So, where's the bottleneck? Clearly, there's one, so is it a CPU, a >> disk or something else? Or maybe network, because you're using EBS? >> >> What do you mean by 'utilization'? How do you measure that? > > The bottleneck is I/O somehow. I say somehow, because I see iowait > averaging about 50% between two CPUs, but there is just no writes to > the destination EBS volume really happening, just reads from the > disk where the source dump is located, then bursts of writes to the > destination volume every so often. It's kind of puzzling. This is > happening on multiple database servers, in multiple availability > zones. Driving me bonkers. > > What I mean by utilization is util% from iostat -m -x 1. I find this rather contradictory. At one moment you say the disk isn't saturated, the next moment you say you're I/O bound. Also, iowait (as reported e.g. by 'top') is tricky to interpret correctly, especially on multi-cpu systems (nice intro to the complexity [1]). It's really difficult to interpret the 50% iowait without more info about what's happening on the machine. IMHO, the utilization (as reported by iotop) is much easier to interpret, because it means '% of time the device was servicing requests'. It has issues too, because 100% does not mean 'saturated' (especially on RAID arrays that can service multiple requests in parallel), but it's better than iowait. If I had to guess based from your info, I'd bet you're CPU bound, so there's very little idle time and about 50% of it is spent waiting for I/O requests (hence the 50% iowait). But in total the amount of I/O is very small, so %util is ~0. Please, post a few lines of 'iostat -x -k 1' output. Samples from 'top' and 'vmstat 1' would be handy too. regards Tomas [1] http://veithen.blogspot.cz/2013/11/iowait-linux.html -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance