Re: Odd sudden performance degradation related to temp object churn

Mark Kirkwood <mark.kirkwood@xxxxxxxxxxxxxxx> · Sat, 19 Aug 2017 13:49:49 +1200

On 19/08/17 02:21, Jeremy Finzel wrote:
On Tue, Aug 15, 2017 at 12:07 PM, Scott Marlowe 
<scott.marlowe@xxxxxxxxx <mailto:scott.marlowe@xxxxxxxxx>> wrote:

    So do iostat or iotop show you if / where your disks are working
    hardest? Or is this CPU overhead that's killing performance?

Sorry for the delayed reply. I took a look in more detail at the query 
plans from our problem query during this incident. There are actually 
6 plans, because there were 6 unique queries.  I traced one query 
through our logs, and found something really interesting. That is that 
all of the first 5 queries are creating temp tables, and all of them 
took upwards of 500ms each to run.  The final query, however, is a 
simple select from the last temp table, and that query took 0.035ms!  
This really confirms that somehow, the issue had to do with /writing 
/to the SAN, I think.  Of course this doesn't answer a whole lot, 
because we had no other apparent issues with write performance at all.

I also provide some graphs below.

Hi, graphs for latency (or await etc) might be worth looking at too - 
sometimes the troughs between the IO spikes are actually when the disks 
have been overwhelmed with queued up pending IOs...

Also SANs are notorious for this sort of thing - typically they have a 
big RAM cache that you are actually writing to, and everything is nice 
and fast until your workload (along with everyone else's) fills up the 
cache and then performance drops of a cliff for a while (I've seen SAN 
disks with iostat utilizations of 105% <-- Lol... and await numbers that 
scroll off the page in that scenario)!

regards
Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance