On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch <wultsch@xxxxxxxxx> wrote: > On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: >> On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch <wultsch@xxxxxxxxx> wrote: >>> The double write buffer is one of the few areas where InnoDB does more >>> IO (in the form of fsynch's) than PG. InnoDB also has fuzzy >>> checkpoints (which help to keep dirty pages in memory longer), >>> buffering of writing out changes to secondary indexes, and recently >>> tunable page level compression. >> >> Baron Schwartz was talking to me about this at Surge. I don't really >> understand how the fuzzy checkpoint stuff works, and I haven't been >> able to find a good description of it anywhere. How does it keep >> dirty pages in memory longer? Details on the other things you mention >> would be interesting to hear, too. > > For checkpoint behavior: > http://books.google.com/books?id=S_yHERPRZScC&pg=PA606&lpg=PA606&dq=fuzzy+checkpoint&source=bl&ots=JJrzRUKBGh&sig=UOMPsRy5E-YDgjAFkaSVn3dps_M&hl=en&ei=_k8yTOfeHYzZnAepyumLBA&sa=X&oi=book_result&ct=result&resnum=8&ved=0CEYQ6AEwBw#v=onepage&q=fuzzy%20checkpoint&f=false > > I would think that best case behavior "sharp" checkpoints with a large > checkpoint_completion_target would have behavior similar to a fuzzy > checkpoint. Well, under that definition of a fuzzy checkpoint, our checkpoints are fuzzy even with checkpoint_completion_target=0. What Baron seemed to be describing was a scheme whereby you could do what I might call partial checkpoints. IOW, you want to move the redo pointer without writing out ALL the dirty buffers in memory, so you write out the pages with the oldest LSNs and then move the redo pointer to the oldest LSN you have left. Except that doesn't quite work, because the page might have been dirtied at LSN X and then later updated again at LSN Y, and you still have to flush it to disk before moving the redo pointer to any value >X. So you work around that by maintaining a "first dirtied" LSN for each page as well as the current LSN. I'm not 100% sure that this is how it works or that it would work in PG, but even assuming that it is and does, I'm not sure what the benefit is over the checkpoint-spreading logic we have now. There might be some benefit in sorting the writes that we do, so that we can spread out the fsyncs. So, write all the blocks to a give file, fsync, and then repeat for each underlying data file that has at least one dirty block. But that's completely orthogonal to (and would actually be hindered by) the approach described in the preceding paragraph. > Insert (for innodb 1.1+ evidently there is also does delete and purge) > buffering: > http://dev.mysql.com/doc/refman/5.5/en/innodb-insert-buffering.html We do something a bit like this for GIST indices. It would be interesting to see if it also has a benefit for btree indices. > For a recent ~800GB db I had to restore, the insert buffer saved 92% > of io needed for secondary indexes. > > Compression: > http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals.html > > For many workloads 50% compression results in negligible impact to > performance. For certain workloads compression can help performance. > Please note that InnoDB also has non-tunable toast like feature. Interesting. I am surprised this works well. It seems that this only works for pages that can be compressed by >=50%, which seems like it could result in a lot of CPU wasted on failed attempts to compress. >>> Given that InnoDB is not shipping its logs across the wire, I don't >>> think many users would really care if it used the double writer or >>> full page writes approach to the redo log (other than the fact that >>> the log files would be bigger). PG on the other hand *is* pushing its >>> logs over the wire... >> >> So how is InnoDB doing replication? Is there a second log just for that? >> > > The other log is the "binary log" and it is one of the biggest > problems with MySQL. Running MySQL in such a way that the binary log > stays in sync with the InnoDB redo has a very significant impact on > performance. > http://www.mysqlperformanceblog.com/2010/10/23/mysql-limitations-part-2-the-binary-log/ > http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html > (check out the pretty graph) Hmm. That seems kinda painful. Having to ship full page images over the wire doesn't seems so bad by comparison, though I'm not very happy about having to do that either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance