Re: Segmentation Fault Issue in PostgreSQL 13.4

Vijaykumar Jain <vijaykumarjain.github@xxxxxxxxx> · Sat, 22 Jun 2024 13:47:01 +0530

On Sat, 22 Jun 2024 at 12:35, Veerendra Pulapa <veerendra.pulapa@xxxxxxxxxx> wrote:

Hello Folks,

I’m encountering a segmentation fault issue in PostgreSQL 13.4 when executing a specific function (cause_segfault()). The server process terminates with a signal 11 (Segmentation fault), but the database service continues running without a full shutdown.

Details:

      •     PostgreSQL Version: 13.4
      •     Environment: RHEL 8
      •     Steps to Reproduce:
      1.    Create function: CREATE FUNCTION cause_segfault() RETURNS void AS 'cause_segfault', 'cause_segfault' LANGUAGE C STRICT;
      2.    Execute function: SELECT cause_segfault();

Observations:

      •     Logs indicate the segmentation fault (LOG: server process (PID XXX) was terminated by signal 11: Segmentation fault).

Request for Assistance:

      •     Has anyone encountered similar issues with PostgreSQL 13.4?
      •     Are there known bugs or fixes related to segmentation faults in recent PostgreSQL versions?
      •     Any advice on troubleshooting or resolving this issue would be greatly appreciated.

what is this function ? 
what is the function definition ?how was this deployed to be consumed by postgresql

now there are many ways to generate a segfault 
for ex.

void cause_segfault()
{
  int * p = (int*)0x12345678;
  *p = 0;
}
and include that as a c function in postgresql using
PostgreSQL: Documentation: 16: 38.10. C-Language Functions

and then invoke them on a running postgresql to cause a segfault.
you can enable a core dump on segfault and the generate a stack trace from the core file, that should help you point to the exact function code that you intended to segfault with a trace.

since this segfault is harmless, it did not cause data corruption or wal corruption, when a database restart was triggered, it did not have to replay bad wal file.
as a result, it started fine.

if there was a case of corrupt data in wal recovery, db would not have started.

if you want to play with wal corruption

pg_ctl -D db1 -l db1.log -m immediate stop ( this will force a recovery)

#find the wal files after the REDO file and corrupt them somehow using 
pg_controldata -D datadir | grep REDO

#corrupt the wal file 
dd if=/dev/urandom of=db1/pg_wal/00000001000000000000001D bs=8k seek=1 count=1 conv=notrunc  (dont do it on prod setups and systems without backups)

depending on your recovery setup, the corrupt wal will be replayed on recovery and fail.

pg_ctl -D db1 -l db1.log start

then you may want to 
pg_resetwal -D datadir 
to remove all the wals to be able to get the db up and working. but this also means your db will be in inconsistent state since you wiped the wals and it cannot recover it back,

there are more ways to corrupt the dbs , some ones are
How to corrupt your PostgreSQL database | CYBERTEC PostgreSQL | Services & Support (cybertec-postgresql.com)

so tldr;
if the segfault does not cause data corruption, it might restart fine.
else there might be issues with recovery and db may not start at all.

I can be corrected. Its a little while i touched postgresql.

-- 
Thanks,Vijay