Re: Odd sudden performance degradation related to temp object churn

Jerry Sievers <gsievers19@xxxxxxxxxxx> · Mon, 14 Aug 2017 17:16:36 -0500

Scott Marlowe <scott.marlowe@xxxxxxxxx> writes:

> On Mon, Aug 14, 2017 at 2:46 PM, Jeremy Finzel <finzelj@xxxxxxxxx> wrote:
>
>> On Mon, Aug 14, 2017 at 3:01 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx>
>> wrote:
>>>
>>> On Mon, Aug 14, 2017 at 1:53 PM, Jeremy Finzel <finzelj@xxxxxxxxx> wrote:
>>> >
>>> > Any insights would be greatly appreciated, as we are concerned not
>>> > knowing
>>> > the root cause.
>>>
>>> How are your disks setup? One big drive with everything on it?
>>> Separate disks for pg_xlog and pg's data dir and the OS logging? IO
>>> contention is one of the big killers of db performance.
>>
>>
>> It's one san volume ssd for the data and wal files.  But logging and memory
>> spilling and archived xlogs go to a local ssd disk.
>>
>>>
>>> Logging likely isn't your problem, but yeah you don't need to log
>>> ERRYTHANG to see the problem either. Log long running queries temp
>>> usage, buffer usage, query plans on slow queries, stuff like that.
>>>
>>> You've likely hit a "tipping point" in terms of data size. Either it's
>>> cause the query planner to make a bad decision, or you're spilling to
>>> disk a lot more than you used to.
>>>
>>> Be sure to log temporary stuff with log_temp_files = 0 in your
>>> postgresql.conf and then look for temporary file in your logs. I bet
>>> you've started spilling into the same place as your temp tables are
>>> going, and by default that's your data directory. Adding another drive
>>> and moving pgsql's temp table space to it might help.
>>
>>
>> We would not have competition between disk spilling and temp tables because
>> what I described above - they are going to two different places.  Also, I
>> neglected to mention that we turned on auto-explain during this crisis, and
>> found the query plan was good, it was just taking forever due to thrashing
>> just seconds after we kicked off the batches.  I did NOT turn on log_analyze
>> and timing but it was enough to see there was no apparent query plan
>> regression.  Also, we had no change in the performance/plan after
>> re-analyzing all tables.
>
> You do know that temp tables go into the default temp table space,
> just like sorts, right?

Not so.

This system has no defined temp_tablespace however spillage due to
sorting/hashing that exceeds work_mem goes to base/pgsql_tmp which we
have symlinked out to a local SSD drive.

We do run a few of our other systems with temp_tablespace defined and
for these the heap/index files do share same volume as other temp usage.

Thx

>
> Have you used something like iostat to see which volume is getting all the IO?
>
>>
>>>
>>> Also increasing work_mem (but don't go crazy, it's per sort, so can
>>> multiply fast on a busy server)
>>
>>
>> We are already up at 400MB, and this query was using memory in the low KB
>> levels because it is very small (1 - 20 rows of data per temp table, and no
>> expensive selects with missing indexes or anything).
>
> Ahh so it doesn't sound like it's spilling to disk then. Do the logs
> say yes or no on that?
>
> Basically use unix tools to look for where you're thrashing. iotop can
> be handy too.

-- 
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@xxxxxxxxxxx
p: 312.241.7800

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance