Re: [External] Re: standby replication server throws invalid memory alloc request size , does not start up

Vijaykumar Jain <vjain@xxxxxxxxxxxxx> · Thu, 28 Jun 2018 13:47:56 +0000

Thanks Scott.

fsync is on.

postgres=# show fsync;
 fsync
-------
 on
(1 row)

The is a storage in qa which is less powerful then prod. (it’s on a Ubuntu 16.04 VM, vsan  / ssd and database is on lvm partition)

It is an ext4 filesystem.
/path/vgx-root ext4      108G  6.3G   97G   7% /

Wrt:
 - your disks might be cheating on sync commands (consumer grade disks are notorious for this)    
 - you might be using consumer-grade flash that can't flush its cache on power loss

Ok I am not much aware of quality of hardware, i'll check with the team, we have not seen this issue in atleast 2 years of since I have been here.

We do have multiple points of backup (s3 daily, barman and 2 standby for each primary).

----though not relevant to the bug but ...

And this particular cluster is one of a kind with a multi master mesh (if you could call it) setup.
Each region has one table that logically replicated to all the other regions db servers.

And also we spin up only one db per cluster for the same reason as it is easy to isolate the problem and use cheap hardware (if that really is __)

But I'll check with the team and what the expectation should be from the storage that we get.

Thanks Scott. I just wanted to confirm if this is nothing to do with pg bug.

Vijay

On 6/28/18, 6:56 PM, "Scott Ribe" <scott_ribe@xxxxxxxxxxxxxxxx> wrote:

    You need to worry, very much, that your setup did not preserve data write ordering when the power went out.

    - you might have fsync turned off in PG
    - you might be using a non-journaled filesystem
    - your disks might be cheating on sync commands (consumer grade disks are notorious for this)
    - you might be using consumer-grade flash that can't flush its cache on power loss

    This highly unlikely to be a PG bug, and you do need to worry that it could have been the master if you had been less lucky.

    --
    Scott Ribe
    scott_ribe@xxxxxxxxxxxxxxxx
    https://www.linkedin.com/in/scottribe/

    > On Jun 28, 2018, at 7:17 AM, Vijaykumar Jain <vjain@xxxxxxxxxxxxx> wrote:
    > 
    > My only concern was do I need to worry about this error showing up again?