On Sun, Aug 25, 2013 at 7:57 PM, 高健 <luckyjackgao@xxxxxxxxx> wrote: > Hi : > > Thanks to Alvaro! Sorry for replying lately. > > I have understood a little about it. > > But the description of full_page_write made me even confused. Sorry that > maybe I go to another problem: > > It is said: > http://www.postgresql.org/docs/9.2/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES > ---------- > When this parameter is on, the PostgreSQL server writes the entire content > of each disk page to WAL during the first modification of that page after a > checkpoint. This is needed because a page write that is in process during an > operating system crash might be only partially completed, leading to an > on-disk page that contains a mix of old and new data. > ------- > > Let me imagine that: > On a disk page, there are following data: > > id=1 val=1 with transaction id of 1001 > id=2 val=2 with transaction id of 1002 > id=3 val=3 with transaction id of 1003 > > If I start DB, > And begin with transaction id of 2002 deal with data of id=2 ,making val to > 20 > Then with trsansaction id of 2003 deal with data of id=3,making val to 30 > > If With full_page_write =off, > When my checkpoint occur, it succeed with transaction 2002 but failed with > 2003 because of crash. A checkpoint either succeeds or fails. It cannot succeed with some transactions and fail with others. > Then disk page will be of: > > id=1 val=1 with transaction id of 1001------maybe this is the very old data > id=2 val=20 with transaction id of 2002------This is now new data > id=3 val=3 with transaction id of 1003------This is old data. Postgres does not do in-place updates, it marks the old row as obsolete and creates a new one. id=1 val= 1 with transaction id of 1001 id=2 val= 2 with transaction id of 1002 xmax of 2002 id=3 val= 3 with transaction id of 1003 xmax of 2003 id=2 val=20 with transaction id of 2002 id=3 val=30 with transaction id of 2003 Of course the whole point of a torn page write is that you don't how much got written, so you don't know what is actually on the disk. > When DB restart from crash, > I think that there are wal data of transaction id 2002 and 2003 beause > that wal written to wal_buffer is before data written to shared_buffer. The wal is written out of wal_buffer and flushed, before the corresponding block is written out of shared_buffer. > So if Online wal log file is ok, there will be no data lost, and > roll-forward and roll-back can happen. > If some online wal log file is dmaged during crash: > There might be some data lost,but if we have archive log, we can restore > back due to archive wal log's latest transaction id. Most WAL records would have no problem being applied to a block that is an otherwise uncorrupted mix of old and new. But some WAL records have instructions that amount to "grab 134 bytes from offset 7134 in the block and move them to offset 1623". If the block is an unknown mix of old and new data, that can't be carried out safely. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general