PostgreSQL container crash trouble.

Daisuke Ikeda <dai.ikd123@xxxxxxxxx> · Mon, 8 Mar 2021 22:30:04 +0900

Dear all.

I'm running PostgreSQL (enabled timescaledb extension) in the following environment.
And I had db pod crashed error several times irregularly.

I want to know the cause of this fail. 

Environment:

  - Base: Kubernetes(microk8s)
  - DB: PostgreSQL 12.4 (TimescaleDB) Container
  - DB OS: Alpine Linux
  - microk8s host OS: CentOS 7.6, Amazon Linux 2 (Occured under some hosts)
  - DB data location: Mounted host directory (for data persistence)

1) PostgreSQL crash and cannot start Pod..

I noticed that the timescaledb pod was restarted repeatedly with the following error.

---
PostgreSQL Database directory appears to contain a database; Skipping initialization
[1] LOG:  starting PostgreSQL 12.4 on x86_64-pc-linux-musl, compiled by gcc (Alpine 9.3.0) 9.3.0, 64-bit
[1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
[1] LOG:  listening on IPv6 address "::", port 5432
[1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
[20] LOG:  database system was shut down at 2021-02-15 21:15:12 UTC
[20] LOG:  invalid primary checkpoint record
[20] PANIC:  could not locate a valid checkpoint record
[1] LOG:  startup process (PID 20) was terminated by signal 6: Aborted
[1] LOG:  aborting startup due to startup process failure
[1] LOG:  database system is shut down
---

I thought it might be WAL trouble and tried to check with pg_controldata and pg_waldump.

---
$ pg_controldata <PostgreSQL data dir>
pg_control version number:            1201
Catalog version number:               201909212
Database system identifier:           6909006008117780509
Database cluster state:               in production
pg_control last modified:             Mon Feb 15 21:11:21 2021
Latest checkpoint location:           15/8FB002C0
Latest checkpoint's REDO location:    15/8FAF3018
Latest checkpoint's REDO WAL file:    00000001000000150000008F
Latest checkpoint's TimeLineID:       1
Latest checkpoint's PrevTimeLineID:   1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID:          0:33988723
Latest checkpoint's NextOID:          117637
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        480
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  33988723
Latest checkpoint's oldestMultiXid:   1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid:0
Latest checkpoint's newestCommitTsXid:0
Time of latest checkpoint:            Mon Feb 15 21:11:14 2021
Fake LSN counter for unlogged rels:   0/3E8
Minimum recovery ending location:     0/0
Min recovery ending loc's timeline:   0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
wal_level setting:                    replica
wal_log_hints setting:                off
max_connections setting:              50
max_worker_processes setting:         15
max_wal_senders setting:              10
max_prepared_xacts setting:           0
max_locks_per_xact setting:           64
track_commit_timestamp setting:       off
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Size of a large-object chunk:         2048
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
Data page checksum version:           0
Mock authentication nonce:            871e705d9393cdb1b161d5562656e20db314779af25942fa10d840983dc4ba84
---

I checked the latest checkpoint's REDO WAL file.

---
$ pg_waldump -n 10 pg_wal/00000001000000150000008F 
pg_waldump: fatal: WAL segment size must be a power of two between 1 MB and 1 GB, but the WAL file "00000001000000150000008F" header specifies 0 bytes
---

I cannot read wal data.
This file size is 16MB (according to the wal size setting).
But the content is all zero data. I checked this situation with "od" command.

---
$ od -N 40 -A d -v pg_wal/00000001000000150000008F
0000000 000000 000000 000000 000000 000000 000000 000000 000000
0000016 000000 000000 000000 000000 000000 000000 000000 000000
0000032 000000 000000 000000 000000
0000040
---

The WAL log of the previous one could be seen with pg_waldump.
However, I could not find any log corresponding to the latest checkpoint location (8FB002C0) listed in pg_controldata.

2) Run pg_resetwal and successful start Pod

I ran the pg_resetwal command to repair the WAL trouble.
And the DB pod successfully started.

But, I received select query fail for some tables.

3) failed select query

---
sampledb1=# select * from table1 limit 1;
ERROR:  missing chunk number 0 for toast value 27678 in pg_toast_2619
---

I thought taht this error is raised for pg_statistic table (and related toast table (pg_toast_2619)).
So, I deleted the broken rows in pg_statistic table and ran ANALYZE query.
After then, this tables is recovered.

4) another data trouble

I had an another data trouble after pg_resetwal.

In some table, SELECT query did not return any rows.
And INSERT query failed (no response. Waiting forever...)
This table have only primary key index.
I thought it might have been caused by an pkey index trouble after pg_resetwal.
I didn't know how to repair the index of the primary key, and finally I dropped table and restore.

I thought this wal trouble was caused by disk IO troubles. But any error was not raised in OS syslog.
I want to know any other causes.

Beat regards.