Re: Why so long between archive calls?

"Chris Hoover" <revoohc@xxxxxxxxx> · Thu, 7 Sep 2006 16:02:34 -0400

Tom,

I am able to tell how far behind the archiving is running since every minute, I copy the current archive_logs to a backup directory.  This is allowing me to be able to do an up to the minute pitr if required.  The archive_command removes the file from this backup dir when the file is archived by PostgreSQL.

Here my script for that:
#!/bin/bash

## archive the current log file every minute to make sure we can recover within 1 minute

###set -vx

source ~postgres/etc/postgres.profile

date >~/log

if [ -z $1 ]
then

  echo "archive_current_pg_xlog.sh <port>"
  exit

fi

if [ -f /tmp/archive_log ]
then

  echo "archive is already running - exiting"
  echo "`/bin/hostname` - archive is running long" | mail -s "`/bin/hostname` problem" $DBAPAGER $DBAMAIL

  exit
fi

touch /tmp/archive_log

serverPort=$1
serverName=`/bin/hostname`

currentArchiveDir=`echo ${archiveBaseDir}/${serverName}/${serverPort}/current_log`

echo "currentArchiveDir: "$currentArchiveDir >> ~/log

mkdir -p $currentArchiveDir

find $DB_PATH/50001/pg_xlog/ -cmin 1 -type f | grep -v "archive_status" > ~/cp_list
ls $DB_PATH/50001/pg_xlog/archive_status/*.done | while read FILE
do
  FILE2=`echo $FILE|sed 's/\.done//'`

  cat ~/cp_list | grep -v $FILE2 > ~/cp_list2
  mv ~/cp_list2 ~/cp_list
  rm $currentArchiveDir/$FILE2 2>/dev/null
done

cat ~/cp_list | while read FILE
do

  echo "archiving $FILE" >> ~/log

  cp $FILE ${currentArchiveDir}/.

done

find ${archiveBaseDir}/${serverName}/${serverPort}/pg_xlog_arch/ -type f -mmin -5 |
  sed 's/.*pg_xlog_arch\///' |
  while read FILE
    do
      rm -v ${currentArchiveDir}/$FILE >> log 2>/dev/null

    done

rm /tmp/archive_log

date >> ~/log

--------------------

Here is my postgresql.conf

listen_addresses = '*' 
port = 50001
max_connections = 256
shared_buffers = 32768  

max_prepared_transactions = 256         

work_mem = 4096                 
maintenance_work_mem = 1048576          
max_fsm_pages = 8000000                 
max_fsm_relations = 10000               
vacuum_cost_delay = 1000                        

vacuum_cost_page_hit = 1                
vacuum_cost_page_miss = 10              
vacuum_cost_page_dirty = 20             
vacuum_cost_limit = 8000                        
wal_buffers = 64                        

checkpoint_segments = 256               
checkpoint_timeout = 1800               
archive_command = '/home/postgres/bin/archive_pg_xlog.sh %p %f 50001'   
effective_cache_size = 383490           
random_page_cost = 2                    

default_statistics_target = 100         
constraint_exclusion = on
redirect_stderr = on  
log_directory = 'pg_log' 
log_truncate_on_rotation = on 
log_rotation_age = 1440 
log_rotation_size = 1048576 

log_min_messages = debug2 
log_min_duration_statement = 0 
log_connections = on
log_disconnections = on
log_duration = on
log_line_prefix = '%d,%p,%u,%m,%c,%l,%s,%x,%i,' 
log_statement = 'all'              

stats_start_collector = on
stats_command_string = on
stats_block_level = on
stats_row_level = on
stats_reset_on_server_start = on
autovacuum = on                 
autovacuum_naptime = 60         

autovacuum_vacuum_threshold = 1000      
autovacuum_analyze_threshold = 500      
autovacuum_vacuum_scale_factor = 0.001  
autovacuum_analyze_scale_factor = 0.0005        
autovacuum_vacuum_cost_delay = -1       

autovacuum_vacuum_cost_limit = -1       
statement_timeout = 0 
lc_messages = 'C'      
lc_monetary = 'C'            
lc_numeric = 'C'  
lc_time = 'C'        
add_missing_from = on

Chris

On 9/6/06, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

"Chris Hoover" <revoohc@xxxxxxxxx> writes:
> Currenty, there are no .ready files in the pg_xlog/archive_status file.

Well, that explains why the archiver thinks it has nothing to do.

> Like I mentioned, the db server was stoped and restarted on 9/5 am, but the
> oldest unarchived log file is:
> -rw-------  1 postgres postgres 16M Sep  1 04:04 0000000100000197000000F1

Are you sure that's an unarchived file, and not a file waiting to be

used later?  What's the current WAL endpoint address?  (pg_controldata
would tell you ... but your trace showed 000000010000019700000078 being
archived, which would suggest that this one is just waiting to be reused.)

Have you got a particularly large checkpoint_segments setting?

                        regards, tom lane