Search Postgresql Archives

Re: "PANIC: could not open critical system index 2662" - twice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14/04/2023 10:42 am, Alban Hertroys wrote:
> Your problem coincides with a thread at freebsd-current with very
> similar data corruption after a recent OpenZFS import: blocks of all
> zeroes, but also missing files. So, perhaps these problems are related?
> Apparently, there was a recent fix for a data corruption issue with the block_cloning feature enabled, but people are still seeing corruption even when they never enabled that feature.
>
> I couldn’t really find the start of the thread in the archives, so this one kind of jumps into the middle of the thread at a relevant-looking point:
>
> https://lists.freebsd.org/archives/freebsd-current/2023-April/003446.html

That thread was a bit over my head, I'm afraid, so I can't say if it's
related. I haven't detected any missing files, anyway.


Well, the problem happened again! Kind of... This time PG has not
crashed with the PANIC error in the subject, but pg_dumping certain DBs
again fails with


pg_dump: error: connection to server on socket
"/var/run/postgresql/.s.PGSQL.5434" failed: FATAL:  index
"pg_class_oid_index" contains unexpected zero page at block 0

PG server log contains:

2023-05-03 04:31:49.903 UTC [14724]
postgres@test_behavior_638186279733138190 FATAL:  index
"pg_class_oid_index" contains unexpected zero page at block 0
2023-05-03 04:31:49.903 UTC [14724]
postgres@test_behavior_638186279733138190 HINT:  Please REINDEX it.

The server PID does not change on such a pg_dump attempt, so it appears
that only the backend process for the pg_dump connection crashes this
time. I don't see any disk errors and there haven't been any OS crashes.

This currently happens for two DBs. Both of them are very small DBs
created by automated .NET tests using Npgsql as client. The code creates
such a test DB from a template DB and tries to drop it at the end of the
test. This times out sometimes and on timeout our test code tries to
drop the DB again (while the first drop command is likely still pending
on the server). This second attempt to drop the DB also timed out:

[12:40:39] Npgsql.NpgsqlException : Exception while reading from stream
 ----> System.TimeoutException : Timeout during reading attempt
   at
Npgsql.NpgsqlConnector.<ReadMessage>g__ReadMessageLong|194_0(NpgsqlConnector
connector, Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean
readingNotifications, Boolean isReadingPrependedMessage)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean
isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlDataReader.NextResult()
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior,
Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(CommandBehavior behavior,
Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async,
CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery()

...
[12:41:41] (same error again for the same DB)

>From looking at old logs it seems like the same thing happened last time
(in April) as well. That's quite an unlikely coincidence if a bad disk
or bad filesystem was to blame, isn't it?

I've tried to reproduce this by re-running those tests over and over,
but without success so far. So what can I do about this? Do I just try
to drop those databases again manually? But what about the next time it
happens? How do I figure out the cause and prevent this problem?





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux