Re: 'replication checkpoint has wrong magic' on the newly cloned replicas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, Nov 30, 2017, at 01:41, Andres Freund wrote:
> 
> > It is part of replication origins feature, which is fairly new stuff
> > (see src/backend/replication/logical/origin.c).  I'd bet this problem
> > is related to a bug in logical replication "origins" code rather than
> > any procedural problems in your base-backup taking setup ...
> 
> Possible.
> 
> What's the max_replication_origins setting? Is the system receiving
> logical replication data? Could you describe the setup a bit? Any chance
> the system's partially been running without fsync? Could you attach both
> a corrupt and a non-corrupt state file?

max_replication_slots is 5 and logical replication is not used
altogether there. fsync is always turned on, the other configuration
settings from the master are attached. 

The replica configuration is almost identical to the master (we
decreased random_page_costs for systems running on SSDs).

diff /tmp/settings_master.txt /tmp/settings_replica.txt
115c115
< krb_server_keyfile    FILE:/server/postgres/9.6.5/etc/krb5.keytab
---
> krb_server_keyfile	FILE:/server/postgres/9.6.6/etc/krb5.keytab
186c186
< random_page_cost      3
---
> random_page_cost	1.5
194,195c194,195
< server_version        9.6.5
< server_version_num    90605
---
> server_version	9.6.6
> server_version_num	90606
222c222
< tcp_keepalives_interval       75
---
> tcp_keepalives_interval	90
239c239
< transaction_read_only off
---
> transaction_read_only	on
273c273

The system is a typical OLTP, the master normally has a single streaming
physical replica and one delayed one. At the time the issue happened the
replica in question was the second physical replica, after it has been
created  the previous replica  has been decommissioned.

Unfortunately, I don't have a 'corrupt' file from the replica, as the
data has been reinitialized afterwards.  I will try to reproduce the
issue by cloning it couple more times. The replorigin_checkpoint from
the master is attached, but its magic seems to be fine:

od -x replorigin_checkpoint
0000000 dade 1257 b236 6a00
0000010

The same file from the current replica is identical.

-- 
Sincerely,
Alex


Attachment: replorigin_checkpoint
Description: Binary data

Expanded display is used automatically.
allow_system_table_mods	off
application_name	psql
archive_command	/data/postgres/bin/sync-wal.sh "%p" "%f"
archive_mode	on
archive_timeout	300
array_nulls	on
authentication_timeout	60
autovacuum	on
autovacuum_analyze_scale_factor	0.02
autovacuum_analyze_threshold	50
autovacuum_freeze_max_age	200000000
autovacuum_max_workers	10
autovacuum_multixact_freeze_max_age	400000000
autovacuum_naptime	60
autovacuum_vacuum_cost_delay	20
autovacuum_vacuum_cost_limit	-1
autovacuum_vacuum_scale_factor	0.05
autovacuum_vacuum_threshold	50
autovacuum_work_mem	-1
backend_flush_after	0
backslash_quote	safe_encoding
bg_mon.listen_address	0.0.0.0
bg_mon.naptime	1
bg_mon.port	8080
bgwriter_delay	200
bgwriter_flush_after	64
bgwriter_lru_maxpages	100
bgwriter_lru_multiplier	2
block_size	8192
bonjour	off
bonjour_name	
bytea_output	hex
check_function_bodies	on
checkpoint_completion_target	0.8
checkpoint_flush_after	32
checkpoint_timeout	300
checkpoint_warning	30
client_encoding	UTF8
client_min_messages	notice
cluster_name	pgsql_shard1
commit_delay	0
commit_siblings	5
config_file	pgsql_shard1/9.6/data/postgresql.conf
constraint_exclusion	partition
cpu_index_tuple_cost	0.005
cpu_operator_cost	0.0025
cpu_tuple_cost	0.01
cursor_tuple_fraction	0.1
data_checksums	off
data_directory	pgsql_shard1/9.6/data
DateStyle	ISO, MDY
db_user_namespace	off
deadlock_timeout	1000
debug_assertions	off
debug_pretty_print	on
debug_print_parse	off
debug_print_plan	off
debug_print_rewritten	off
default_statistics_target	100
default_tablespace	
default_text_search_config	pg_catalog.english
default_transaction_deferrable	off
default_transaction_isolation	read committed
default_transaction_read_only	off
default_with_oids	off
dynamic_library_path	$libdir
dynamic_shared_memory_type	posix
effective_cache_size	12582912
effective_io_concurrency	4
enable_bitmapscan	on
enable_hashagg	on
enable_hashjoin	on
enable_indexonlyscan	on
enable_indexscan	on
enable_material	on
enable_mergejoin	on
enable_nestloop	on
enable_seqscan	on
enable_sort	on
enable_tidscan	on
escape_string_warning	on
event_source	PostgreSQL
exit_on_error	off
external_pid_file	
extra_float_digits	0
force_parallel_mode	off
from_collapse_limit	8
fsync	on
full_page_writes	on
geqo	on
geqo_effort	5
geqo_generations	0
geqo_pool_size	0
geqo_seed	0
geqo_selection_bias	2
geqo_threshold	12
gin_fuzzy_search_limit	0
gin_pending_list_limit	4096
hba_file	pgsql_shard1/9.6/data/pg_hba.conf
hot_standby	on
hot_standby_feedback	on
huge_pages	try
ident_file	pgsql_shard1/9.6/data/pg_ident.conf
idle_in_transaction_session_timeout	0
ignore_checksum_failure	off
ignore_system_indexes	off
integer_datetimes	on
IntervalStyle	postgres
join_collapse_limit	8
krb_caseins_users	off
krb_server_keyfile	FILE:/server/postgres/9.6.5/etc/krb5.keytab
lc_collate	en_US.UTF-8
lc_ctype	en_US.UTF-8
lc_messages	en_US.UTF8
lc_monetary	en_US.UTF8
lc_numeric	en_US.UTF8
lc_time	en_US.UTF8
listen_addresses	*
lo_compat_privileges	off
local_preload_libraries	
lock_timeout	0
log_autovacuum_min_duration	-1
log_checkpoints	on
log_connections	on
log_destination	csvlog
log_directory	pg_log
log_disconnections	off
log_duration	off
log_error_verbosity	default
log_executor_stats	off
log_file_mode	0644
log_filename	postgresql-%Y-%m-%d_%H%M%S.log
log_hostname	off
log_line_prefix	
log_lock_waits	on
log_min_duration_statement	500
log_min_error_statement	error
log_min_messages	warning
log_parser_stats	off
log_planner_stats	off
log_replication_commands	off
log_rotation_age	1440
log_rotation_size	102400
log_statement	all
log_statement_stats	off
log_temp_files	-1
log_timezone	Europe/Berlin
log_truncate_on_rotation	off
logging_collector	on
maintenance_work_mem	524288
max_connections	1200
max_files_per_process	1000
max_function_args	100
max_identifier_length	63
max_index_keys	32
max_locks_per_transaction	64
max_parallel_workers_per_gather	0
max_pred_locks_per_transaction	64
max_prepared_transactions	0
max_replication_slots	5
max_stack_depth	2048
max_standby_archive_delay	300000
max_standby_streaming_delay	300000
max_wal_senders	5
max_wal_size	384
max_worker_processes	8
min_parallel_relation_size	1024
min_wal_size	8
old_snapshot_threshold	-1
operator_precedence_warning	on
parallel_setup_cost	1000
parallel_tuple_cost	0.1
password_encryption	on
pg_stat_statements.max	10000
pg_stat_statements.save	on
pg_stat_statements.track	top
pg_stat_statements.track_utility	on
port	5432
post_auth_delay	0
pre_auth_delay	0
quote_all_identifiers	off
random_page_cost	3
replacement_sort_tuples	150000
restart_after_crash	on
row_security	on
search_path	zc_api_r17_00_46, public
segment_size	131072
seq_page_cost	1
server_encoding	UTF8
server_version	9.6.5
server_version_num	90605
session_preload_libraries	
session_replication_role	origin
shared_buffers	2097152
shared_preload_libraries	pg_stat_statements,bg_mon
sql_inheritance	on
ssl	on
ssl_ca_file	
ssl_cert_file	server.crt
ssl_ciphers	HIGH:MEDIUM:+3DES:!aNULL
ssl_crl_file	
ssl_ecdh_curve	prime256v1
ssl_key_file	server.key
ssl_prefer_server_ciphers	on
standard_conforming_strings	on
statement_timeout	600000
stats_temp_directory	pg_stat_tmp
superuser_reserved_connections	3
synchronize_seqscans	on
synchronous_commit	off
synchronous_standby_names	
syslog_facility	local0
syslog_ident	postgres
syslog_sequence_numbers	on
syslog_split_messages	on
tcp_keepalives_count	5
tcp_keepalives_idle	600
tcp_keepalives_interval	75
temp_buffers	1024
temp_file_limit	20971520
temp_tablespaces	
TimeZone	Europe/Berlin
timezone_abbreviations	Default
trace_notify	off
trace_recovery_messages	log
trace_sort	off
track_activities	on
track_activity_query_size	1024
track_commit_timestamp	off
track_counts	on
track_functions	all
track_io_timing	on
transaction_deferrable	off
transaction_isolation	read committed
transaction_read_only	off
transform_null_equals	off
unix_socket_directories	pgsql_shard1/9.6
unix_socket_group	
unix_socket_permissions	0777
update_process_title	on
vacuum_cost_delay	10
vacuum_cost_limit	200
vacuum_cost_page_dirty	20
vacuum_cost_page_hit	1
vacuum_cost_page_miss	10
vacuum_defer_cleanup_age	0
vacuum_freeze_min_age	50000000
vacuum_freeze_table_age	150000000
vacuum_multixact_freeze_min_age	5000000
vacuum_multixact_freeze_table_age	150000000
wal_block_size	8192
wal_buffers	2048
wal_compression	off
wal_keep_segments	0
wal_level	replica
wal_log_hints	off
wal_receiver_status_interval	10
wal_receiver_timeout	60000
wal_retrieve_retry_interval	5000
wal_segment_size	2048
wal_sender_timeout	60000
wal_sync_method	fdatasync
wal_writer_delay	200
wal_writer_flush_after	128
work_mem	16384
xmlbinary	base64
xmloption	content
zero_damaged_pages	off
Time: 14.248 ms

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux