On 07/11/2011 02:43 PM, Merlin Moncure wrote:
On Mon, Jul 11, 2011 at 2:16 PM, Kevin Grittner
<Kevin.Grittner@xxxxxxxxxxxx> wrote:
lars<lhofhansl@xxxxxxxxx> wrote:
Stopping the UPDATEs, waiting for any CHECKPOINTs to finish,
and then running the SELECTs indeed shows a similar slowdown.
Interestingly I see very heavy WAL traffic while executing the
SELECTs. (So I was confused as to what caused the WAL traffic).
Hint bit changes aren't logged, so if it was that you would be
seeing writes to the heap, but not to the WAL. Clean-up of dead
tuples is logged -- this is probably the result of pruning dead
tuples. You could probably reduce the impact on your SELECT
statements at least a little by making autovacuum more aggressive.
yeah. In fact, I'd like to disable autovacuum completely just to
confirm this. In particular I'd like to know if that removes wal
traffic when only selects are going on. Another way to check is to
throw some queries to pg_stat_activity during your select period and
see if any non-select activity (like autovacum vacuum). Basically I'm
suspicious there is more to this story.
hint bit flusing causing i/o during SELECT is a typical complaint
(especially on systems with underperformant i/o), but I'm suspicious
if that's really the problem here. Since you are on a virtualized
platform, I can't help but wonder if you are running into some
bottleneck that you wouldn't see on native hardware.
What's iowait during the slow period?
merlin
Thanks Kevin and Merlin this is extremely helpful...
Ok, that makes much more sense (WALing hint bits did not make sense).
I disabled auto-vacuum and did four tests:
1. Run a bunch of updates, stop that process, wait until checkpointing
is finished, and run the selects (as before).
2. run VACUUM manually, then run the SELECTs
3. Have the UPDATEs and SELECTs touch a mutually exclusive, random sets
of row (still in sets of 10000).
So the SELECTs are guaranteed not to select rows that were updated.
4. Lastly, change the UPDATEs to update a non-indexed column. To rule
out Index maintenance. Still distinct set of rows.
In the first case I see the same slowdown (from ~14ms to ~400-500ms).
pg_stat_activity shows no other load during that
time. I also see write activity only on the WAL volume.
In the 2nd case after VACUUM is finished the time is back to 14ms. As an
aside: If I run the SELECTs while VACUUM is
running the slowdown is about the same as in the first case until
(apparently) VACUUM has cleaned up most of the table,
at which point the SELECTs become faster again (~50ms).
In the 3rd case I see exactly the same behavior, which is interesting.
Both before VACUUM is run and after.
There's no guarantee obviously that distinct rows do not share the same
page of course especially since the index is
updated as part of this (but see the 4th case).
In case 4 I still see the same issue. Again both before and after VACUUM.
In all cases I see from pg_stat_bgwriter that no backend writes buffers
directly (but I think that only pertains to dirty buffers, and not the WAL).
So I think I have a partial answer to my initial question.
However, that brings me to some other questions:
Why do SELECTs cause dead tuples to be pruned (as Kevin suggests)?
That even happens when the updates did not touch the selected rows(?)
And why does that slow down the SELECTs? (checkpointing activity on the
EBS volume holding the database for example
does not slow down SELECTs at all, only WAL activity does). Does the
selecting backend do that work itself?
Lastly, is this documented somewhere? (I apologize if it is and I missed
it). If not I'd be happy to write a wiki entry for this.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance