Scott Marlowe <scott.marlowe@xxxxxxxxx> writes: > On Mon, Aug 14, 2017 at 2:46 PM, Jeremy Finzel <finzelj@xxxxxxxxx> wrote: > >> On Mon, Aug 14, 2017 at 3:01 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> >> wrote: >>> >>> On Mon, Aug 14, 2017 at 1:53 PM, Jeremy Finzel <finzelj@xxxxxxxxx> wrote: >>> > >>> > Any insights would be greatly appreciated, as we are concerned not >>> > knowing >>> > the root cause. >>> >>> How are your disks setup? One big drive with everything on it? >>> Separate disks for pg_xlog and pg's data dir and the OS logging? IO >>> contention is one of the big killers of db performance. >> >> >> It's one san volume ssd for the data and wal files. But logging and memory >> spilling and archived xlogs go to a local ssd disk. >> >>> >>> Logging likely isn't your problem, but yeah you don't need to log >>> ERRYTHANG to see the problem either. Log long running queries temp >>> usage, buffer usage, query plans on slow queries, stuff like that. >>> >>> You've likely hit a "tipping point" in terms of data size. Either it's >>> cause the query planner to make a bad decision, or you're spilling to >>> disk a lot more than you used to. >>> >>> Be sure to log temporary stuff with log_temp_files = 0 in your >>> postgresql.conf and then look for temporary file in your logs. I bet >>> you've started spilling into the same place as your temp tables are >>> going, and by default that's your data directory. Adding another drive >>> and moving pgsql's temp table space to it might help. >> >> >> We would not have competition between disk spilling and temp tables because >> what I described above - they are going to two different places. Also, I >> neglected to mention that we turned on auto-explain during this crisis, and >> found the query plan was good, it was just taking forever due to thrashing >> just seconds after we kicked off the batches. I did NOT turn on log_analyze >> and timing but it was enough to see there was no apparent query plan >> regression. Also, we had no change in the performance/plan after >> re-analyzing all tables. > > You do know that temp tables go into the default temp table space, > just like sorts, right? Not so. This system has no defined temp_tablespace however spillage due to sorting/hashing that exceeds work_mem goes to base/pgsql_tmp which we have symlinked out to a local SSD drive. We do run a few of our other systems with temp_tablespace defined and for these the heap/index files do share same volume as other temp usage. Thx > > Have you used something like iostat to see which volume is getting all the IO? > >> >>> >>> Also increasing work_mem (but don't go crazy, it's per sort, so can >>> multiply fast on a busy server) >> >> >> We are already up at 400MB, and this query was using memory in the low KB >> levels because it is very small (1 - 20 rows of data per temp table, and no >> expensive selects with missing indexes or anything). > > Ahh so it doesn't sound like it's spilling to disk then. Do the logs > say yes or no on that? > > Basically use unix tools to look for where you're thrashing. iotop can > be handy too. -- Jerry Sievers Postgres DBA/Development Consulting e: postgres.consulting@xxxxxxxxxxx p: 312.241.7800 -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance