On Thu, Nov 30, 2017, at 00:22, Alvaro Herrera wrote: > Alex Kliukin wrote: > > > 2017-11-15 13:15:46.673 CET,,,15154,,5a0c2ff1.3b32,5,,2017-11-15 > > 13:15:45 CET,,0,PANIC,XX000,"replication checkpoint has wrong magic > > 5714534 instead of 307747550",,,,,,,,,"" > > Uhh ... I had never heard of this "replication checkpoint" thing. It is > part of replication origins feature, which is fairly new stuff (see > src/backend/replication/logical/origin.c). I'd bet this problem is > related to a bug in logical replication "origins" code rather than any > procedural problems in your base-backup taking setup ... We are not using logical replication or logical decoding on those hosts. On the master, pg_replication_origin is empty as well as pg_replication_slots Those masters were fairly recently (around 2 months ago) upgraded from 9.3. > > I wonder if there is some truncation of the 0x1257DADE value that > produces the 5714534 value you're seeing -- something related to a > pg_logical/replorigin_checkpoint file being written partially while the > backup is being taken. 307747550 = 0x1257DADE 0001 0010 0101 0111 1101 1010 1101 1110 5714534 = 0x573266 = w2f ASCII 0000 0000 0101 0111 0011 0010 0110 0110 I see no patterns here. What is interesting is that 0x573266 is actually mentioned in relcache.c #define RELCACHE_INIT_FILENAME "pg_internal.init" #define RELCACHE_INIT_FILEMAGIC 0x573266 /* version ID value */ it's a file magic for the relcache init files, but given that the copy is performed by just compressing and decompressing the original files I don't see how those 2 could be confused by software. > > Another point towards not including pg_logical/ contents when taking a > base backup, I guess ... In our case wouldn't it just mask the real issue? -- Sincerely, Alex