On Wed, 2004-05-12 at 20:18, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > I have configured 24 checkpoint segments, > > which I expect gives me a worst-case usage in pg_xlog of 384MB. > > Unfortunately, during the CREATE INDEX, pg_xlog becomes full! > > Postgres version? AFAIR 7.1.2 or thereabouts would act that way, > because it wouldn't discard log segments containing work of open > transactions. This is 7.4.1. > The only reason I can think of for such behavior in a current version > is if the checkpoint isn't completing for some reason. Have you looked > in the postmaster log to see if checkpoint is bleating about anything? Oh sure, it's bleating. Apparently my computer is too fast: syslog:May 12 21:20:32 mistral postgres[17458]: [214-1] LOG: checkpoints are occurring too frequently (12 seconds apart) syslog:May 12 21:20:32 mistral postgres[17458]: [214-2] HINT: Consider increasing the configuration parameter "checkpoint_segments". That's also what preceeded the most recent crash: May 12 16:36:53 mistral postgres[506]: [173-1] LOG: checkpoints are occurring too frequently (16 seconds apart) May 12 16:36:53 mistral postgres[506]: [173-2] HINT: Consider increasing the configuration parameter "checkpoint_segments". May 12 16:37:08 mistral postgres[506]: [174-1] LOG: server process (PID 16403) was terminated by signal 6 May 12 16:37:08 mistral postgres[506]: [175-1] LOG: terminating any other active server processes May 12 16:37:08 mistral postgres[26212]: [3-1] WARNING: terminating connection because of crash of another server process May 12 16:37:08 mistral postgres[26212]: [3-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server May 12 16:37:08 mistral postgres[26212]: [3-3] process exited abnormally and possibly corrupted shared memory. May 12 16:37:08 mistral postgres[26212]: [3-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. May 12 16:37:08 mistral postgres[14396]: [1-1] WARNING: terminating connection because of crash of another server process May 12 16:37:08 mistral postgres[14396]: [1-2] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server May 12 16:37:08 mistral postgres[14396]: [1-3] process exited abnormally and possibly corrupted shared memory. May 12 16:37:08 mistral postgres[14396]: [1-4] HINT: In a moment you should be able to reconnect to the database and repeat your command. May 12 16:37:08 mistral postgres[14396]: [1-5] CONTEXT: writing block 56889 of relation 17142/289600495 May 12 16:37:08 mistral postgres[506]: [176-1] LOG: all server processes terminated; reinitializing May 12 16:37:08 mistral postgres[17086]: [177-1] LOG: database system was interrupted at 2004-05-12 16:36:53 PDT May 12 16:37:08 mistral postgres[17086]: [178-1] LOG: checkpoint record is at 6/92B32D48 May 12 16:37:08 mistral postgres[17086]: [179-1] LOG: redo record is at 6/92B32D48; undo record is at 0/0; shutdown FALSE May 12 16:37:08 mistral postgres[17086]: [180-1] LOG: next transaction ID: 10232; next OID: 289600632 May 12 16:37:08 mistral postgres[17086]: [181-1] LOG: database system was not properly shut down; automatic recovery in progress May 12 16:37:08 mistral postgres[17086]: [182-1] LOG: redo starts at 6/92B32D88 May 12 16:45:29 mistral postgres[17086]: [183-1] LOG: could not open file "/var/lib/postgres/data/pg_xlog/0000000700000071" (log file 7, segment 113): No such file or directory May 12 16:45:29 mistral postgres[17086]: [184-1] LOG: redo done at 7/70FFC400 May 12 16:45:50 mistral postgres[506]: [177-1] LOG: startup process (PID 17086) was terminated by signal 6 May 12 16:45:50 mistral postgres[506]: [178-1] LOG: aborting startup due to startup process failure I also have many "recycled" messages: May 12 13:10:01 mistral postgres[16373]: [20-1] LOG: recycled transaction log file "0000000600000078" May 12 13:10:01 mistral postgres[16373]: [21-1] LOG: recycled transaction log file "0000000600000079" May 12 13:10:01 mistral postgres[16373]: [22-1] LOG: recycled transaction log file "000000060000007A" May 12 13:10:01 mistral postgres[16373]: [23-1] LOG: recycled transaction log file "000000060000007B" May 12 13:10:01 mistral postgres[16373]: [24-1] LOG: recycled transaction log file "000000060000007C" May 12 13:20:03 mistral postgres[16408]: [1-1] LOG: removing transaction log file "0000000600000090" May 12 13:20:03 mistral postgres[16408]: [2-1] LOG: removing transaction log file "0000000600000091" > Note that your calculation is off anyway: per documentation, recent > versions will allow up to 2*checkpoint_segments+1 WAL files to exist. > This is because we save the last two checkpoints worth of data. That's why I gave it an order of magnitude leeway. Such calculations are always off by a factor or two. It's doing it again at this very moment, I get the feeling that pg_xlog is converging on the same size as $PGDATA/base/$OID/pgsql_tmp: mistral:/backup_temp# du -hs pg_xlog /var/lib/postgres/data/base/17142/pgsql_tmp/ 4.8G pg_xlog 5.4G /var/lib/postgres/data/base/17142/pgsql_tmp -jwb ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)