Hi,
I suspect I am having a case of data corruption. Here are the details: postgresql94-9.4.8-1PGDG.rhel7.x86_64
postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
I have a cluster of 3 databases, db1 and db2 are masters and replicate between each other and also replicate to db3 (db1 <-> db2, db1 -> db3, db2 -> db3).
For replication I am using Bucardo.
The problem I am having is that one record in a table it shows a wrong value for one single field:
select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315811, 75315815) order by transactionid;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019239 | 75315811 | mix:5,2,7
160019237 | 75315811 | mix:5,4,8
160019235 | 75315811 | mix:6,2,9
160019233 | 75315811 | mix:1,9,8
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | 75315811 | backup:1,9,1
160019261 | 75315815 | backup:2,0,9
select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315811, 75315815) order by transactionid;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019239 | 75315811 | mix:5,2,7
160019237 | 75315811 | mix:5,4,8
160019235 | 75315811 | mix:6,2,9
160019233 | 75315811 | mix:1,9,8
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | 75315811 | backup:1,9,1
160019261 | 75315815 | backup:2,0,9
select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315815) order by transactionid;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | 75315811 | backup:1,9,1
160019261 | 75315815 | backup:2,0,9
select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315811) order by transactionid;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019239 | 75315811 | mix:5,2,7
160019237 | 75315811 | mix:5,4,8
160019235 | 75315811 | mix:6,2,9
160019233 | 75315811 | mix:1,9,8
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019271 | 75315815 | mix:9,0,9
160019269 | 75315815 | mix:9,8,9
160019267 | 75315815 | mix:9,2,2
160019265 | 75315815 | mix:2,2,8
160019263 | 75315811 | backup:1,9,1
160019261 | 75315815 | backup:2,0,9
select gameplayid, transactionid, encodedplay from mytable where transactionid in (75315811) order by transactionid;
gameplayid | transactionid | encodedplay
------------+---------------+--------------
160019239 | 75315811 | mix:5,2,7
160019237 | 75315811 | mix:5,4,8
160019235 | 75315811 | mix:6,2,9
160019233 | 75315811 | mix:1,9,8
So the record with gameplayid = 160019263 have a wrong transactionid, 75315811 instead of 75315815.
The correct value is 75315815 and that I know because of the following facts:
- on db1 and db2 transactionid = 75315815 for gameplayid = 160019263,
- on db1 and db2 transactionid = 75315815 for gameplayid = 160019263,
- this table gets mostly inserts, very rare updates and only on other 2 fields not this one.
- there is another parent table that shows the number of records in this table which is 4 for transactionid =75315811 and 6 for transactionid = 7531581.
This table has an index by transactionid and that index seem correct because the filtering and the ordering are fine (like the filed has the correct value)...
What puzzles me is that the value that shows in this field is a real value from another record...
I only caught this issue because I have a script that runs in the night that compares the databases ...
By now I updated the field with the correct value and everything seem stable.
Postgres logs don't have any information about file corruption or any other kind of error. I also checked other logs on the system and I could not find any traces of corruption.
select name, setting from pg_settings order by 1;
name | setting
-------------------------------------+-----------------------------------------------------------------------------------------------------------------
allow_system_table_mods | off
application_name | psql
archive_command | test -f /cbnDBscripts/tmp/PITR_primarydb_stop_backup || rsync --timeout=60 -atz %p stldrdb:/data01/wal_files/%f
archive_mode | on
archive_timeout | 60
array_nulls | on
authentication_timeout | 60
autovacuum | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_analyze_threshold | 50
autovacuum_freeze_max_age | 200000000
autovacuum_max_workers | 3
autovacuum_multixact_freeze_max_age | 400000000
autovacuum_naptime | 60
autovacuum_vacuum_cost_delay | 20
autovacuum_vacuum_cost_limit | -1
autovacuum_vacuum_scale_factor | 0.2
autovacuum_vacuum_threshold | 50
autovacuum_work_mem | -1
backslash_quote | safe_encoding
bgwriter_delay | 200
bgwriter_lru_maxpages | 100
bgwriter_lru_multiplier | 2
block_size | 8192
bonjour | off
bonjour_name |
bytea_output | hex
check_function_bodies | on
checkpoint_completion_target | 0.5
checkpoint_segments | 16
checkpoint_timeout | 300
checkpoint_warning | 30
client_encoding | UTF8
client_min_messages | error
commit_delay | 0
commit_siblings | 5
config_file | /data01/postgres/postgresql.conf
constraint_exclusion | partition
cpu_index_tuple_cost | 0.005
cpu_operator_cost | 0.0025
cpu_tuple_cost | 0.01
cursor_tuple_fraction | 0.1
data_checksums | off
data_directory | /data01/postgres
DateStyle | ISO, MDY
db_user_namespace | off
deadlock_timeout | 1000
debug_assertions | off
debug_pretty_print | on
debug_print_parse | off
debug_print_plan | off
debug_print_rewritten | off
default_statistics_target | 100
default_tablespace |
default_text_search_config | pg_catalog.english
default_transaction_deferrable | off
default_transaction_isolation | read committed
default_transaction_read_only | off
default_with_oids | off
dynamic_library_path | $libdir
dynamic_shared_memory_type | posix
effective_cache_size | 1048576
effective_io_concurrency | 1
enable_bitmapscan | on
enable_hashagg | on
enable_hashjoin | on
enable_indexonlyscan | on
enable_indexscan | on
enable_material | on
enable_mergejoin | on
enable_nestloop | on
enable_seqscan | off
enable_sort | on
enable_tidscan | on
escape_string_warning | on
event_source | PostgreSQL
exit_on_error | off
external_pid_file |
extra_float_digits | 0
from_collapse_limit | 8
fsync | on
full_page_writes | on
geqo | on
geqo_effort | 5
geqo_generations | 0
geqo_pool_size | 0
geqo_seed | 0
geqo_selection_bias | 2
geqo_threshold | 12
gin_fuzzy_search_limit | 0
hba_file | /data01/postgres/pg_hba.conf
hot_standby | off
hot_standby_feedback | off
huge_pages | try
ident_file | /data01/postgres/pg_ident.conf
ignore_checksum_failure | off
ignore_system_indexes | off
integer_datetimes | on
IntervalStyle | postgres
join_collapse_limit | 8
krb_caseins_users | off
krb_server_keyfile | FILE:/etc/sysconfig/pgsql/krb5.keytab
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
lc_messages | en_US.UTF-8
lc_monetary | en_US.UTF-8
lc_numeric | en_US.UTF-8
lc_time | en_US.UTF-8
listen_addresses | *
local_preload_libraries |
lock_timeout | 0
lo_compat_privileges | off
log_autovacuum_min_duration | -1
log_checkpoints | off
log_connections | on
log_destination | csvlog
log_directory | pg_log
log_disconnections | on
log_duration | off
log_error_verbosity | default
log_executor_stats | off
log_file_mode | 0600
log_filename | postgresql-%d.log
logging_collector | on
log_hostname | off
log_line_prefix | %t %d %u
log_lock_waits | off
log_min_duration_statement | 2000
log_min_error_statement | error
log_min_messages | error
log_parser_stats | off
log_planner_stats | off
log_rotation_age | 1440
log_rotation_size | 0
log_statement | none
log_statement_stats | off
log_temp_files | -1
log_timezone | America/St_Lucia
log_truncate_on_rotation | on
maintenance_work_mem | 131072
max_connections | 300
max_files_per_process | 1000
max_function_args | 100
max_identifier_length | 63
max_index_keys | 32
max_locks_per_transaction | 64
max_pred_locks_per_transaction | 64
max_prepared_transactions | 0
max_replication_slots | 0
max_stack_depth | 2048
max_standby_archive_delay | 30000
max_standby_streaming_delay | 30000
max_wal_senders | 0
max_worker_processes | 8
password_encryption | on
port | 5432
post_auth_delay | 0
pre_auth_delay | 0
quote_all_identifiers | off
random_page_cost | 4
restart_after_crash | on
search_path | "$user",public,hstore,dblink,www_fdw
segment_size | 131072
seq_page_cost | 1
server_encoding | UTF8
server_version | 9.4.8
server_version_num | 90408
session_preload_libraries |
session_replication_role | origin
shared_buffers | 262144
shared_preload_libraries |
sql_inheritance | on
ssl | off
ssl_ca_file |
ssl_cert_file | server.crt
ssl_ciphers | HIGH:MEDIUM:+3DES:!aNULL
ssl_crl_file |
ssl_ecdh_curve | prime256v1
ssl_key_file | server.key
ssl_prefer_server_ciphers | on
ssl_renegotiation_limit | 0
standard_conforming_strings | on
statement_timeout | 0
stats_temp_directory | pg_stat_tmp
superuser_reserved_connections | 3
synchronize_seqscans | on
synchronous_commit | on
synchronous_standby_names |
syslog_facility | local0
syslog_ident | postgres
tcp_keepalives_count | 0
tcp_keepalives_idle | 0
tcp_keepalives_interval | 0
temp_buffers | 1024
temp_file_limit | -1
temp_tablespaces |
TimeZone | America/St_Lucia
timezone_abbreviations | Default
trace_notify | off
trace_recovery_messages | log
trace_sort | off
track_activities | on
track_activity_query_size | 1024
track_counts | on
track_functions | none
track_io_timing | off
transaction_deferrable | off
transaction_isolation | read committed
transaction_read_only | off
transform_null_equals | off
unix_socket_directories | /var/run/postgresql, /tmp
unix_socket_group |
unix_socket_permissions | 0777
update_process_title | on
vacuum_cost_delay | 0
vacuum_cost_limit | 200
vacuum_cost_page_dirty | 20
vacuum_cost_page_hit | 1
vacuum_cost_page_miss | 10
vacuum_defer_cleanup_age | 0
vacuum_freeze_min_age | 50000000
vacuum_freeze_table_age | 150000000
vacuum_multixact_freeze_min_age | 5000000
vacuum_multixact_freeze_table_age | 150000000
wal_block_size | 8192
wal_buffers | 128
wal_keep_segments | 0
wal_level | hot_standby
wal_log_hints | off
wal_receiver_status_interval | 10
wal_receiver_timeout | 60000
wal_segment_size | 2048
wal_sender_timeout | 60000
wal_sync_method | fdatasync
wal_writer_delay | 200
work_mem | 2048
xmlbinary | base64
xmloption | content
zero_damaged_pages | off
(239 rows)
Any suggestions, thoughts?