Re: [ext] Re: Losing data because of problematic configuration?

"Holtgrewe, Manuel" <manuel.holtgrewe@xxxxxxxxxxxxxx> · Tue, 15 Jun 2021 11:40:33 +0000

Hi,

thanks for your answer.

Let me give some background. I have a postgres instance that serves as the data storage for a web-based data analytics application. For some queries, I'm seeing postgres going OOM because the query grows too large and subsequently the linux kernel kills
 the postgres system. I'm then observing similar log messages and data loss. I'm now trying to reproduce this behaviour in a more deterministic way to learn about the root cause and resolve the issue.

The bulk import is not running inside of a transaction for performance reasons. My understanding is that in case of crashes I might end up with partial data (that I take care of in the application). However, I would not expect rollback behaviour a few minutes
 after the bulk import went through correctly.

FWIW, the analytics application also allows users to annotate the data and these annotations are written to the database in transactions.

So to answer your questions:

> What kind of transaction did you use?

No transaction for bulk import. Also, bulk import completed minutes before the kill. After the bulk import, a number of transactions are performed touching different tables.

> Did you commit the transaction?

The bulk import was not done in a transaction. The other transactions were committed through the database access framework I'm using in my (Python/django) application.

> Why?

To reproduce the problematic beahaviour that I'm seeing in my application.

Does this help? Where could I look for understanding this better?

Thanks,
Manuel

From: Ron <ronljohnsonjr@xxxxxxxxx>

Sent: Tuesday, June 15, 2021 13:07

To: pgsql-general@xxxxxxxxxxxxxxxxxxxx

Subject: [ext] Re: Losing data because of problematic configuration?

On 6/15/21 5:42 AM, Holtgrewe, Manuel wrote:

Hi,

I have a database that is meant to have high-performance for bulk insert operations. I've attached my postgres.conf file.

However, I'm seeing the following behaviour. At around 12:04, I have started the database. Then, I did a bulk insert and that completed.

What kind of transaction did you use?

Did you commit the transaction?

I then went on to kill postgres processes at 12:33 with SIGSEGV signal.

Why?

Did you CHECKPOINT beforehand?

(I'm hypothesizing that data didn't get flushed to disk, and so Pg "cleaned itself up" after the crash.)

I'm getting the following log excerpt:

< 2021-06-15 12:33:03.656 CEST > LOG:  database system was interrupted; last known up at 2021-06-15 12:04:13 CEST

< 2021-06-15 12:33:03.656 CEST > DEBUG:  removing all temporary WAL segments

< 2021-06-15 12:33:04.525 CEST > DEBUG:  checkpoint record is at 60/7C377C78
[snip]

< 2021-06-15 12:33:04.537 CEST > DEBUG:  resetting unlogged relations: cleanup 1 init 0

< 2021-06-15 12:33:04.553 CEST > DEBUG:  unlinked file "base/16384/107877"

[... snip ... ]

< 2021-06-15 12:33:27.556 CEST > DEBUG:  copying base/16384/80948_init to base/16384/80948

< 2021-06-15 12:33:36.705 CEST > DEBUG:  performing replication slot checkpoint

< 2021-06-15 12:33:38.394 CEST > DEBUG:  attempting to remove WAL segments older than log file 00000000000000600000007C

< 2021-06-15 12:33:38.394 CEST > DEBUG:  removing write-ahead log file "00000001000000600000007C"

< 2021-06-15 12:33:38.403 CEST > DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1

< 2021-06-15 12:33:38.419 CEST > DEBUG:  oldest MultiXactId member is at offset 1

< 2021-06-15 12:33:38.419 CEST > DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1

< 2021-06-15 12:33:38.428 CEST > DEBUG:  shmem_exit(0): 1 before_shmem_exit callbacks to make

< 2021-06-15 12:33:38.428 CEST > DEBUG:  shmem_exit(0): 5 on_shmem_exit callbacks to make

So it looks as if the database jumps back "half an hour" to ensure consistent data. Everything in between is lost.

What would be configuration settings to look into to make this more stable? Is there a way to "force flush state to disk" from the application/through postgres API/SQL?

Thank you,
Manuel

-- 

Angular momentum makes the world go 'round.