Calculating how much redo log space has been used

Reza Taheri <rtaheri@xxxxxxxxxx> · Tue, 14 Aug 2018 18:51:34 +0000

Hello PostgreSQL community,
I am helping with a benchmarking exercise using PGSQL (I chair
the TPC subcommittee that has released a benchmark using PGSQL). A requirement of the benchmark is having enough log space allocated for 8 hours of running without needing to archive, back up, etc. I am trying
 to a) figure out how I can establish the exact space usage for the auditor; and b) how I can reduce the log space usage. Looking at iostat and pgstatspack, it looks like we will need to allocate something like 1.5TB of log space for a 5TB database, which is
 a huge ratio. (Yes, in the real world, we’d probably archive or ship the logs; but for benchmarking, that doesn’t work)

pgstatspack gives me something like below:

background writer stats
 checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc 
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
                22 |
              0 |           
6416768 |      
2252636 |               
0 |          
280211 |       
9786558
(1 row)

background writer relative stats
 checkpoints_timed | minutes_between_checkpoint | buffers_checkpoint | buffers_clean | buffers_backend | total_writes | avg_checkpoint_write 
-------------------+----------------------------+--------------------+---------------+-----------------+--------------+----------------------
 100%             
|                         
6 | 71%               
| 25%          
| 3%             
| 8.659 MB/s  
| 2278.000 MB

I can calculate how many checkpoint segments I have used from the MB/s. But is there a more direct way of seeing how/when a checkpoint segment is filled up and we move on to the next one?

Also, it looks like the full_page_writes parameter is the only thing that can help reduce the log usage size, but that I have to set it to 1 to avoid corruption after a system crash, which is a requirement.
 Another requirement is a very short, 6-minute checkpoint time, which means we will likely write the full page very often. Yes, my hands are tied!

Here are the relevant non-default settings:

shared_buffers = 18000MB          
# min 128kB
temp_buffers = 2MB                
# min 800kB
maintenance_work_mem = 5MB        
# min 1MB
bgwriter_delay = 10ms             
# 10-10000ms between rounds
bgwriter_lru_maxpages = 200       
# 0-1000 max buffers written/round
effective_io_concurrency = 10            
# 1-1000; 0 disables prefetching
wal_sync_method = open_datasync          
# the default is the first option
wal_buffers = 16MB                
# min 32kB, -1 sets based on shared_buffers
wal_writer_delay = 10ms                  
# 1-10000 milliseconds
checkpoint_segments = 750         
# in logfile segments, min 1, 16MB each
checkpoint_timeout = 6min         
# range 30s-1h
checkpoint_completion_target = 0.9
# checkpoint target duration, 0.0 - 1.0
effective_cache_size = 512MB
default_statistics_target = 10000 
# range 1-10000
log_destination = 'stderr'        
# Valid values are combinations of
logging_collector = on                   
# Enable capturing of stderr and csvlog
log_directory = 'pg_log'          
# directory where log files are written,
log_filename = 'postgresql-%a.log'
# log file name pattern,
log_truncate_on_rotation = on            
# If on, an existing log file with the
log_rotation_age = 1d             
# Automatic rotation of logfiles will
log_rotation_size = 0             
# Automatic rotation of logfiles will
log_checkpoints = on