Re: Strange times in WAL files in archive directory (9.3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24/01/2017 11:12, Achilleas Mantzios wrote:
Hello,
we are running PostgreSQL 9.3 on a new VM running Linux 3.16.0-4-amd64 #1 SMP, Debian 8.6, and we setup WAL archiving yesterday. We were planing to keep a window of 15 days worth of WAL files, so we thought of writing a script to delete files older than 15 days. The settings used were rather common :
archive_mode = on
archive_command = '/usr/bin/scp %p sma:/smadb/pgsql/pitr/%f'
archive_timeout=240
However, this morning, to my surprise I saw this by doing ls on the archive directory :
# ls -l /smadb/pgsql/pitr/ | less
total 12680296
-rw------- 1 postgres postgres 16777216 Jan 24 03:44 000000020000070100000000
-rw------- 1 postgres postgres 16777216 Jan 24 03:50 000000020000070100000001
-rw------- 1 postgres postgres 16777216 Jan 24 03:54 000000020000070100000002
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 000000020000070100000003
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 000000020000070100000004
-rw------- 1 postgres postgres 16777216 Jan 24 04:04 000000020000070100000005
-rw------- 1 postgres postgres 16777216 Jan 24 04:09 000000020000070100000006
-rw------- 1 postgres postgres 16777216 Jan 24 04:14 000000020000070100000007
-rw------- 1 postgres postgres 16777216 Jan 24 04:20 000000020000070100000008
-rw------- 1 postgres postgres 16777216 Jan 24 04:20 000000020000070100000009
-rw------- 1 postgres postgres 16777216 Jan 24 04:25 00000002000007010000000A
-rw------- 1 postgres postgres 16777216 Jan 24 04:29 00000002000007010000000B
-rw------- 1 postgres postgres 16777216 Jan 24 04:34 00000002000007010000000C
-rw------- 1 postgres postgres 16777216 Jan 24 04:39 00000002000007010000000D
-rw------- 1 postgres postgres 16777216 Jan 24 04:39 00000002000007010000000E
-rw------- 1 postgres postgres 16777216 Jan 24 04:44 00000002000007010000000F
...........
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000EF
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F1  <------
-rw------- 1 postgres postgres 16777216 Jan 23 13:04 0000000200000701000000F2 <------ time jumps
-rw------- 1 postgres postgres 16777216 Jan 23 13:05 0000000200000701000000F3
-rw------- 1 postgres postgres 16777216 Jan 23 13:06 0000000200000701000000F4

So IMHO either the naming of the WAL filename got damaged somehow or the one of the servers got wrong time or something I miss. "Jan 23 13:04" is indeed the time that we started the server for WAL archiving.
By sorting by time I get :
ls -ltr /smadb/pgsql/pitr/ | less
-rw------- 1 postgres postgres 16777216 Jan 23 13:04 0000000200000701000000F2
-rw------- 1 postgres postgres 16777216 Jan 23 13:05 0000000200000701000000F3
-rw------- 1 postgres postgres 16777216 Jan 23 13:06 0000000200000701000000F4
-rw------- 1 postgres postgres 16777216 Jan 23 13:07 0000000200000701000000F5
-rw------- 1 postgres postgres 16777216 Jan 23 13:08 0000000200000701000000F6
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 0000000200000701000000F7
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 0000000200000701000000F8
-rw------- 1 postgres postgres 16777216 Jan 23 13:10 0000000200000701000000F9
-rw------- 1 postgres postgres 16777216 Jan 23 13:11 0000000200000701000000FA
-rw------- 1 postgres postgres 16777216 Jan 23 13:12 0000000200000701000000FB
-rw------- 1 postgres postgres 16777216 Jan 23 13:13 0000000200000701000000FC
........
-rw------- 1 postgres postgres 16777216 Jan 24 03:15 0000000200000702000000F9
-rw------- 1 postgres postgres 16777216 Jan 24 03:19 0000000200000702000000FA
-rw------- 1 postgres postgres 16777216 Jan 24 03:23 0000000200000702000000FB
-rw------- 1 postgres postgres 16777216 Jan 24 03:27 0000000200000702000000FC
-rw------- 1 postgres postgres 16777216 Jan 24 03:31 0000000200000702000000FD
-rw------- 1 postgres postgres 16777216 Jan 24 03:35 0000000200000702000000FE
-rw------- 1 postgres postgres 16777216 Jan 24 03:39 0000000200000702000000FF
-rw------- 1 postgres postgres 16777216 Jan 24 03:43 000000020000070300000000
-rw------- 1 postgres postgres 16777216 Jan 24 03:44 000000020000070100000000 <----- problem begins
-rw------- 1 postgres postgres 16777216 Jan 24 03:47 000000020000070300000001
-rw------- 1 postgres postgres 16777216 Jan 24 03:50 000000020000070100000001
-rw------- 1 postgres postgres 16777216 Jan 24 03:51 000000020000070300000002
-rw------- 1 postgres postgres 16777216 Jan 24 03:54 000000020000070100000002
........ (same pattern goes on 0000000200000703 mixing with 0000000200000701 )
-rw------- 1 postgres postgres 16777216 Jan 24 09:42 0000000200000701000000EE
-rw------- 1 postgres postgres 16777216 Jan 24 09:45 0000000200000703000000EF
-rw------- 1 postgres postgres 16777216 Jan 24 09:47 0000000200000703000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000703000000F1
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000EF
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 0000000200000701000000F1 <----- problem ends
-rw------- 1 postgres postgres 16777216 Jan 24 09:50 0000000200000703000000F2
-rw------- 1 postgres postgres 16777216 Jan 24 09:52 0000000200000703000000F3
-rw------- 1 postgres postgres 16777216 Jan 24 09:56 0000000200000703000000F4

Any thoughts about any explanation on this?

By scrolling up the terminal on the database server (cause the said file is gone now from pg_xlog) I saw this :
$ ls -ltr ~/data/pg_xlog/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 23 13:01 /var/lib/pgsql/data/pg_xlog/0000000200000701000000F0
$ md5sum ~/data/pg_xlog/0000000200000701000000F0
8ce1917c86d3c9e41455ec13659c2689 /var/lib/pgsql/data/pg_xlog/0000000200000701000000F0

In the archive's host I see this :
# ls -ltr /smadb/pgsql/pitr/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F0
# md5sum /smadb/pgsql/pitr/0000000200000701000000F0
8ce1917c86d3c9e41455ec13659c2689 /smadb/pgsql/pitr/0000000200000701000000F0

Same file, different mod times, so, this makes me think that maybe there was some flushing that the postgresql server did, prior to deleting or recycling some old WAL files. Also I see this :
# ls -ltr /smadb/pgsql/pitr/0000000200000701000000*
-rw------- 1 postgres postgres 16777216 Jan 23 13:04 /smadb/pgsql/pitr/0000000200000701000000F2
-rw------- 1 postgres postgres 16777216 Jan 23 13:05 /smadb/pgsql/pitr/0000000200000701000000F3
-rw------- 1 postgres postgres 16777216 Jan 23 13:06 /smadb/pgsql/pitr/0000000200000701000000F4
-rw------- 1 postgres postgres 16777216 Jan 23 13:07 /smadb/pgsql/pitr/0000000200000701000000F5
-rw------- 1 postgres postgres 16777216 Jan 23 13:08 /smadb/pgsql/pitr/0000000200000701000000F6
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 /smadb/pgsql/pitr/0000000200000701000000F7
-rw------- 1 postgres postgres 16777216 Jan 23 13:09 /smadb/pgsql/pitr/0000000200000701000000F8
-rw------- 1 postgres postgres 16777216 Jan 23 13:10 /smadb/pgsql/pitr/0000000200000701000000F9
-rw------- 1 postgres postgres 16777216 Jan 23 13:11 /smadb/pgsql/pitr/0000000200000701000000FA
-rw------- 1 postgres postgres 16777216 Jan 23 13:12 /smadb/pgsql/pitr/0000000200000701000000FB
-rw------- 1 postgres postgres 16777216 Jan 23 13:13 /smadb/pgsql/pitr/0000000200000701000000FC
-rw------- 1 postgres postgres 16777216 Jan 23 13:14 /smadb/pgsql/pitr/0000000200000701000000FD
-rw------- 1 postgres postgres 16777216 Jan 23 13:15 /smadb/pgsql/pitr/0000000200000701000000FE
-rw------- 1 postgres postgres 16777216 Jan 23 13:16 /smadb/pgsql/pitr/0000000200000701000000FF
-rw------- 1 postgres postgres 16777216 Jan 24 03:44 /smadb/pgsql/pitr/000000020000070100000000 <---- time jump
-rw------- 1 postgres postgres 16777216 Jan 24 03:50 /smadb/pgsql/pitr/000000020000070100000001
-rw------- 1 postgres postgres 16777216 Jan 24 03:54 /smadb/pgsql/pitr/000000020000070100000002
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 /smadb/pgsql/pitr/000000020000070100000003
-rw------- 1 postgres postgres 16777216 Jan 24 04:00 /smadb/pgsql/pitr/000000020000070100000004
......
-rw------- 1 postgres postgres 16777216 Jan 24 09:42 /smadb/pgsql/pitr/0000000200000701000000EE
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000EF
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F0
-rw------- 1 postgres postgres 16777216 Jan 24 09:48 /smadb/pgsql/pitr/0000000200000701000000F1

So this makes me believe that all the files before 0000000200000701000000F2 were present before WAL archiving was setup. So I want to ask :

1) Do you find this behavior normal?
2) From now on, can we assume a monotonic relation between file names and creation times in order to base our maintenance scripts on this?

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux