On 19/08/17 02:21, Jeremy Finzel wrote:
On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe
<scott.marlowe@xxxxxxxxx <mailto:scott.marlowe@xxxxxxxxx>> wrote:
So do iostat or iotop show you if / where your disks are working
hardest? Or is this CPU overhead that's killing performance?
Sorry for the delayed reply. I took a look in more detail at the query
plans from our problem query during this incident. There are actually
6 plans, because there were 6 unique queries. I traced one query
through our logs, and found something really interesting. That is that
all of the first 5 queries are creating temp tables, and all of them
took upwards of 500ms each to run. The final query, however, is a
simple select from the last temp table, and that query took 0.035ms!
This really confirms that somehow, the issue had to do with /writing
/to the SAN, I think. Of course this doesn't answer a whole lot,
because we had no other apparent issues with write performance at all.
I also provide some graphs below.
Hi, graphs for latency (or await etc) might be worth looking at too -
sometimes the troughs between the IO spikes are actually when the disks
have been overwhelmed with queued up pending IOs...
Also SANs are notorious for this sort of thing - typically they have a
big RAM cache that you are actually writing to, and everything is nice
and fast until your workload (along with everyone else's) fills up the
cache and then performance drops of a cliff for a while (I've seen SAN
disks with iostat utilizations of 105% <-- Lol... and await numbers that
scroll off the page in that scenario)!
regards
Mark
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance