On Tue, Oct 26, 2010 at 5:41 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: > On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner > <Kevin.Grittner@xxxxxxxxxxxx> wrote: >> Rob Wultsch <wultsch@xxxxxxxxx> wrote: >> >>> I would think full_page_writes=off + double write buffer should be >>> far superior, particularly given that the WAL is shipped over the >>> network to slaves. >> >> For a reasonably brief description of InnoDB double write buffers, I >> found this: >> >> http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/ >> >> One big question before even considering this would by how to >> determine whether a potentially torn page "is inconsistent". >> Without a page CRC or some such mechanism, I don't see how this >> technique is possible. > > There are two sides to this problem: figuring out when to write a page > to the double write buffer, and figuring out when to read it back from > the double write buffer. The first seems easy: we just do it whenever > we would XLOG a full page image. As to the second, when we write the > page out to the double write buffer, we could also write to the double > write buffer the LSN of the WAL record which depends on that full page > image. Then, at the start of recovery, we scan the double write > buffer and remember all those LSNs. When we reach one of them, we > replay the full page image. > > The good thing about this is that it would reduce WAL volume; the bad > thing about it is that it would probably mean doing two fsyncs where > we only now do one. > The double write buffer is one of the few areas where InnoDB does more IO (in the form of fsynch's) than PG. InnoDB also has fuzzy checkpoints (which help to keep dirty pages in memory longer), buffering of writing out changes to secondary indexes, and recently tunable page level compression. Given that InnoDB is not shipping its logs across the wire, I don't think many users would really care if it used the double writer or full page writes approach to the redo log (other than the fact that the log files would be bigger). PG on the other hand *is* pushing its logs over the wire... -- Rob Wultsch wultsch@xxxxxxxxx -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance