Hi all,
--
Postgres version: 9.5
OS: Ubuntu 18.04.4
I have a 144GB Bacula database that crashes the postgres daemon when I try to do a pg_dump.
At some point the server ran out of diskspace for the database storage. I expanded the lvm and rebooted the server. It seemed to work fine, however when I try to dump the bacula database the postgres daemon dies after about 37GB.
I tried copying the database to another machine and upgrading postgres to 11 using pg_upgrade. The upgrade seems to work but I still get exactly the same problem when trying to dump the database.
postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
pg_dump: Dumping the contents of table "file" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
pg_dumpall: pg_dump failed on database "bacula", exiting
pg_dump: Dumping the contents of table "file" failed: PQgetCopyData() failed.
pg_dump: Error message from server: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_dump: The command was: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
pg_dumpall: pg_dump failed on database "bacula", exiting
In the logs I see:
2020-05-22 14:23:30.649 CEST [12768] LOG: server process (PID 534) was terminated by signal 11: Segmentation fault
2020-05-22 14:23:30.649 CEST [12768] DETAIL: Failed process was running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
2020-05-22 14:23:30.651 CEST [12768] LOG: terminating any other active server processes
2020-05-22 14:23:30.651 CEST [482] WARNING: terminating connection because of crash of another server process
2020-05-22 14:23:30.651 CEST [482] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-05-22 14:23:30.651 CEST [482] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-05-22 14:23:30.652 CEST [12768] LOG: all server processes terminated; reinitializing
2020-05-22 14:23:30.671 CEST [578] LOG: database system was interrupted; last known up at 2020-05-22 14:15:19 CEST
2020-05-22 14:23:30.809 CEST [578] LOG: database system was not properly shut down; automatic recovery in progress
2020-05-22 14:23:30.819 CEST [578] LOG: redo starts at 197/D605EA18
2020-05-22 14:23:30.819 CEST [578] LOG: invalid record length at 197/D605EA50: wanted 24, got 0
2020-05-22 14:23:30.819 CEST [578] LOG: redo done at 197/D605EA18
2020-05-22 14:23:30.876 CEST [12768] LOG: database system is ready to accept connections
2020-05-22 14:29:07.511 CEST [12768] LOG: received fast shutdown request
2020-05-22 14:23:30.649 CEST [12768] DETAIL: Failed process was running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
2020-05-22 14:23:30.651 CEST [12768] LOG: terminating any other active server processes
2020-05-22 14:23:30.651 CEST [482] WARNING: terminating connection because of crash of another server process
2020-05-22 14:23:30.651 CEST [482] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2020-05-22 14:23:30.651 CEST [482] HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-05-22 14:23:30.652 CEST [12768] LOG: all server processes terminated; reinitializing
2020-05-22 14:23:30.671 CEST [578] LOG: database system was interrupted; last known up at 2020-05-22 14:15:19 CEST
2020-05-22 14:23:30.809 CEST [578] LOG: database system was not properly shut down; automatic recovery in progress
2020-05-22 14:23:30.819 CEST [578] LOG: redo starts at 197/D605EA18
2020-05-22 14:23:30.819 CEST [578] LOG: invalid record length at 197/D605EA50: wanted 24, got 0
2020-05-22 14:23:30.819 CEST [578] LOG: redo done at 197/D605EA18
2020-05-22 14:23:30.876 CEST [12768] LOG: database system is ready to accept connections
2020-05-22 14:29:07.511 CEST [12768] LOG: received fast shutdown request
Any ideas how to fix or debug this?
Nico
Nico De Ranter
Operations Engineer
T. +32 16 38 72 10
eSATURNUS | T. +32 16 40 12 82 |
For Service & Support :
Support Line Belgium: +32 2 2009897
Support Line International: +44 12 56 68 38 78
Or via email : medical.services.eu@xxxxxxxx