2010/11/1 hubert depesz lubaczewski <depesz@xxxxxxxxxx>: > Hi > have strange situation - too many xlog files. > > PostgreSQL 8.3.11 on i386-pc-solaris2.10, compiled by cc -Xa > > config: > # select name, setting from pg_settings where name ~ 'checkpoint|wal' order by 1; > name | setting > ------------------------------+--------------- > checkpoint_completion_target | 0.9 > checkpoint_segments | 100 > checkpoint_timeout | 900 > checkpoint_warning | 30 > log_checkpoints | on > wal_buffers | 2048 > wal_sync_method | open_datasync > wal_writer_delay | 200 > (8 rows) > > as I understand, max number of xlog files in pg_xlog should be ( 1 + 2 * > checkpoint_segments ). (2 + checkpoint_completion_target) * checkpoint_segments + 1 => 291 > And in our case - it's more. > > Added cronjob to log data about number of segments, current segment > name, number of segments in pg_xlog that are before current, and after > current. script is: > > ------------------------------------------------------------ > #!/usr/bin/bash > LOGFILE=/home/postgres/logs/check_pg_xlog.out > > LS_OUTPUT=$( ls -l /pgdata/main/pg_xlog | egrep -v "xlogtemp|backup|status|total" | sort -k9 ) > FIRST_SEGMENT_LINE=$( echo "$LS_OUTPUT" | head -1 ) > LAST_SEGMENT_LINE=$( echo "$LS_OUTPUT" | tail -1 ) > > FIRST_SEGMENT=$( echo "$FIRST_SEGMENT_LINE" | awk '{print $NF}' ) > LAST_SEGMENT=$( echo "$LAST_SEGMENT_LINE" | awk '{print $NF}' ) > FIRST_SEGMENT_NUM=$( echo "$FIRST_SEGMENT" | awk '{print $NF}' | cut -b 9-16,23-24 ) > LAST_SEGMENT_NUM=$( echo "$LAST_SEGMENT" | awk '{print $NF}' | cut -b 9-16,23-24 ) > > SEGMENT_COUNT=$( printf $'ibase=16\n1 + %s - %s\n' $LAST_SEGMENT_NUM $FIRST_SEGMENT_NUM | bc ) > CURRENT_WAL_FILE=$( /opt/pgsql8311/bin/psql -U postgres -qAtX -c 'select file_name from pg_xlogfile_name_offset( pg_current_xlog_location())' ) > CURRENT_WAL_FILE_NUM=$( echo "$CURRENT_WAL_FILE" | cut -b 9-16,23-24 ) > > SEGMENTS_BEFORE_CURRENT=$( printf $'ibase=16\n%s - %s\n' $CURRENT_WAL_FILE_NUM $FIRST_SEGMENT_NUM | bc ) > SEGMENTS_AFTER_CURRENT=$( printf $'ibase=16\n%s - %s\n' $LAST_SEGMENT_NUM $CURRENT_WAL_FILE_NUM | bc ) > > CURRENT_SEGMENT_LINE=$( echo "$LS_OUTPUT" | grep "$CURRENT_WAL_FILE" ) > ( > date > printf $'First segment : %s\n' "$FIRST_SEGMENT_LINE" > printf $'Current segment : %s\n' "$CURRENT_SEGMENT_LINE" > printf $'Last segment : %s\n' "$LAST_SEGMENT_LINE" > printf $'Segment count : %s\n' "$SEGMENT_COUNT" > printf $'Current wal segment : %s\n' "$CURRENT_WAL_FILE" > printf $'Segments before current : %s\n' "$SEGMENTS_BEFORE_CURRENT" > printf $'Segments after current : %s\n' "$SEGMENTS_AFTER_CURRENT" > printf $'Last checkpoint time : %s\n' "$( /opt/pgsql8311/bin/pg_controldata /pgdata/main | egrep '^Time of latest checkpoint:' | sed 's/^[^:]*: *//' )" > /opt/pgsql8311/bin/psql -U postgres -c "select name, setting from pg_settings where name = any('{checkpoint_timeout,checkpoint_segments,archive_mode,archive_command}')" > ) >> $LOGFILE > ------------------------------------------------------------ > > > sample output looks like this: > > | Mon Nov 1 13:46:00 EDT 2010 > | First segment : -rw------- 1 postgres postgres 16777216 Nov 1 13:16 000000010000376700000053 > | Current segment : -rw------- 1 postgres postgres 16777216 Nov 1 13:45 000000010000376700000064 > | Last segment : -rw------- 1 postgres postgres 16777216 Nov 1 13:01 000000010000376800000029 > | Segment count : 215 > | Current wal segment : 000000010000376700000064 > | Segments before current : 17 > | Segments after current : 197 > | Last checkpoint time : Mon Nov 01 13:31:29 2010 > | name | setting > | ---------------------+--------------- > | archive_command | /usr/bin/true > | archive_mode | on > | checkpoint_segments | 100 > | checkpoint_timeout | 900 > | (4 rows) > > As you can see, now we have 215 segments, with 17 that represent wal before current location and 197 that are after current segment! > > Here - you can see graph which plots number of wal segments in the last week http://depesz.com/various/bad-wal.jpg > > it virtually never goes below 215, and it spikes to 270-300. > > In here: http://www.depesz.com/various/bad-wal.log.gz is log from my test script since 20th of october. > > Any ideas why number of segments is higher than expected? > > Just so that I am clear: I do not want to lower it by changing > checkpoint_segments. I'm looking for information/enlightenment about why > it works the way it works, and what could be possibly wrong. > > Best regards, > > depesz > > -- > Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ > jid/gtalk: depesz@xxxxxxxxxx / aim:depeszhdl / skype:depesz_hdl / gg:6749007 > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general