On
12 November 2013 07:49 Maxim Boguk wrote: > >I have few question about checkpoints during create database. > >First just extract from log on my test database 9.2.4: > >2013-11-12 03:48:31 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint starting: immediate force wait >2013-11-12 03:48:31 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint complete: wrote 168 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.314 s, sync=0.146 s, total=0.462 s; sync files=104, longest=0.040
s, >average=0.001 s >2013-11-12 03:48:32 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint starting: immediate force wait >2013-11-12 03:48:32 MSK 1717 @ from [vxid: txid:0] [] LOG: checkpoint complete: wrote 6 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.311 s, sync=0.002 s, total=0.315 s; sync files=6, longest=0.000 s,
>average=0.000 s >2013-11-12 03:48:32 MSK 13609 postgres@hh_data from [local] [vxid:502/0 txid:0] [CREATE DATABASE] LOG: duration: 1160.409 ms statement: create database _tmp; > >So during creating of database two immediate force checkpoints was performed. > >Now questions: > >1)Why these checkpoints performed at all? I understood why checkpoint performed during drop database (to clean shared buffers from the dropped db data), but why issue checkpoint during create database? > >2)Why two checkpoints performed one after one? Two checkpoints are not performed one after one. One is performed before starting the copy. The next checkpoint is performed before committing. The following are the code comments from the two checkpoints. First one: /* * Force a checkpoint before starting the copy. This will force dirty * buffers out to disk, to ensure source database is up-to-date on disk * for the copy. FlushDatabaseBuffers() would suffice for that, but we * also want to process any pending unlink requests. Otherwise, if a * checkpoint happened while we're copying files, a file might be deleted * just when we're about to copy it, causing the lstat() call in copydir() * to fail with ENOENT. */ Second one: /* * We force a checkpoint before committing. This effectively means * that committed XLOG_DBASE_CREATE operations will never need to be * replayed (at least not in ordinary crash recovery; we still have to * make the XLOG entry for the benefit of PITR operations). This * avoids two nasty scenarios: * * #1: When PITR is off, we don't XLOG the contents of newly created * indexes; therefore the drop-and-recreate-whole-directory behavior * of DBASE_CREATE replay would lose such indexes. * * #2: Since we have to recopy the source database during DBASE_CREATE * replay, we run the risk of copying changes in it that were * committed after the original CREATE DATABASE command but before the * system crash that led to the replay. This is at least unexpected * and at worst could lead to inconsistencies, eg duplicate table * names. * * (Both of these were real bugs in releases 8.0 through 8.0.3.) * * In PITR replay, the first of these isn't an issue, and the second * is only a risk if the CREATE DATABASE and subsequent template * database change both occur while a base backup is being taken. * There doesn't seem to be much we can do about that except document * it as a limitation. * * Perhaps if we ever implement CREATE DATABASE in a less cheesy way, * we can avoid this. */ >3)Is there any good way to perform spread checkpoint during create database (similar to --checkpoint=spread for the pg_basebackup) ? >I'm ready to wait 30 min for create database in that case... >I asking because performing immediate checkpoint on the large heavy loaded database - good recipe for downtime (IO become overloaded to point of the total stall)... >Is there any workaround for this problem? > >4)Is idea to add an option for create/drop database syntax to control checkpoint behaviour sounds reasonable?
Regards, Hari babu. |