Re: Failing streaming replication on PostgreSQL 14

Nicolas Seinlet <nicolas@xxxxxxxxxxx> · Mon, 22 Apr 2024 11:50:27 +0000

Hi,

facing the same situation again, but this time, analyzing the wal with xxd shows a different pattern. I had no blocks of 0000.

The output of pg_waldump is:
pg_waldump: fatal: error in WAL record at 11C/93F9FF70: invalid magic number 0000 in log segment 000000010000011C00000093, offset 16384000

The output of xxd -C16 is

00f9ff60: b364 0079 6e61 6d69 6320 6c80 0300 0000  .d.ynamic l.....
00f9ff70: 4000 0000 6659 a406 60f7 f993 1c01 0000  @...fY..`.......
00f9ff80: 000b 0000 82b3 8d9b 0020 1000 7f06 0000  ......... ......

I'm still unable to determine the cause of the issue, nor if the issue is on the primary server sending a corrupted wal segment, or on the secondary receiving a corrupted wal segment, or the openzfs filesystem on the primary allowing wal_sender to read still-not-written wal segment, or ...

Is there any log option I can add on the two clusters to help me locate the issue's origin?

thanks,

Nicolas.

On Tuesday, April 16th, 2024 at 09:56, Nicolas Seinlet <nicolas@xxxxxxxxxxx> wrote:

> 

> 

> Hello,
> 

> > What exactly is "cyphered ZFS"? Can you reproduce the problem with some
> > other filesystem? If it's something very unusual, it might well be a
> > bug in the filesystem.
> 

> 

> The filesystem is openzfs with native aes-256-gcm encryption:
> https://openzfs.github.io/openzfs-docs/man/master/7/zfsprops.7.html#encryption
> 

> I've not tested if we get the same issue on another filesystem.
> 

> I don't face the issue on Ubuntu 20.04/openzfs 0.8/PostgreSQL 12, but I have fewer systems with this deployment.
> On Ubuntu 22.04/openzfs 2.1.5/PostgreSQL 14, I face the issue from time to time, without knowing what triggers the error.
> 

> thanks for helping,
> 

> Nicolas.

Attachment:
signature.asc

Description: OpenPGP digital signature