I was just trying that. It's always the same (huge) table that crashes the pg_dump. Running a dump excluding that one table goes fine, running a dump of only that one table crashes.
In the system logs I always see a segfault
May 22 15:22:14 core4 kernel: [337837.874618] postgres[1311]: segfault at 7f778008ed0d ip 000055f197ccc008 sp 00007ffdd1fc15a8 error 4 in postgres[55f1977c0000+727000]
It doesn't seem to be an Out-of-memory thing (at least not on the OS level).
The database is currently installed on a dedicated server with 32GB RAM. I tried tweaking some of the memory parameters for postgres, but the crash always happens at the exact same spot (if I run pg_dump for that one table with and without memory tweaks the resulting files are identical).
One thing I just noticed looking at the dump file: at around the end of the file I see this:
2087983804 516130 37989 2218636 3079067 0 0 P4B BcISC IGk L BOT BOP A jC BAA I BeMj/b BceUl6 BehUAn 0Ms A C I4p9CBfUiSeAPU4eDuipKQ
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1145127487 1413694803 21071 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
6071772946555290175 1056985679 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
4557430888798830399 1061109567 1061109567 1061109567 1061109567 16191 \N \N ??????????????????????????????
2087983833 554418 37989 5405605 14507502 0 0 P4B Bb8c/ IGk L BOS BOP A Lfh BAA Bg BeMj+2 Bd1LVN BehUAl rlx ABA TOR
It looks suspicious however there are about 837 more lines before the output stops.
Nico
On Fri, May 22, 2020 at 3:27 PM Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 5/22/20 5:37 AM, Nico De Ranter wrote:
> Hi all,
>
> Postgres version: 9.5
> OS: Ubuntu 18.04.4
>
> I have a 144GB Bacula database that crashes the postgres daemon when I
> try to do a pg_dump.
> At some point the server ran out of diskspace for the database storage.
> I expanded the lvm and rebooted the server. It seemed to work fine,
> however when I try to dump the bacula database the postgres daemon dies
> after about 37GB.
>
> I tried copying the database to another machine and upgrading postgres
> to 11 using pg_upgrade. The upgrade seems to work but I still get
> exactly the same problem when trying to dump the database.
>
> postgres@core4:~$ pg_dumpall --cluster 11/main --file=dump.sql
> pg_dump: Dumping the contents of table "file" failed: PQgetCopyData()
> failed.
> pg_dump: Error message from server: server closed the connection
> unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.file (fileid, fileindex, jobid,
> pathid, filenameid, deltaseq, markid, lstat, md5) TO stdout;
> pg_dumpall: pg_dump failed on database "bacula", exiting
What happens if you try to dump just this table?
Something along lines of:
pg_dump -t file -d some_db -U some_user
Have you looked at the system logs to see if it is the OS killing the
process?
>
> In the logs I see:
>
> 2020-05-22 14:23:30.649 CEST [12768] LOG: server process (PID 534) was
> terminated by signal 11: Segmentation fault
> 2020-05-22 14:23:30.649 CEST [12768] DETAIL: Failed process was
> running: COPY public.file (fileid, fileindex, jobid, pathid, filenameid,
> deltaseq, markid, lstat, md5) TO stdout;
> 2020-05-22 14:23:30.651 CEST [12768] LOG: terminating any other active
> server processes
> 2020-05-22 14:23:30.651 CEST [482] WARNING: terminating connection
> because of crash of another server process
> 2020-05-22 14:23:30.651 CEST [482] DETAIL: The postmaster has commanded
> this server process to roll back the current transaction and exit,
> because another server process exited abnormally and possibly corrupted
> shared memory.
> 2020-05-22 14:23:30.651 CEST [482] HINT: In a moment you should be able
> to reconnect to the database and repeat your command.
> 2020-05-22 14:23:30.652 CEST [12768] LOG: all server processes
> terminated; reinitializing
> 2020-05-22 14:23:30.671 CEST [578] LOG: database system was
> interrupted; last known up at 2020-05-22 14:15:19 CEST
> 2020-05-22 14:23:30.809 CEST [578] LOG: database system was not
> properly shut down; automatic recovery in progress
> 2020-05-22 14:23:30.819 CEST [578] LOG: redo starts at 197/D605EA18
> 2020-05-22 14:23:30.819 CEST [578] LOG: invalid record length at
> 197/D605EA50: wanted 24, got 0
> 2020-05-22 14:23:30.819 CEST [578] LOG: redo done at 197/D605EA18
> 2020-05-22 14:23:30.876 CEST [12768] LOG: database system is ready to
> accept connections
> 2020-05-22 14:29:07.511 CEST [12768] LOG: received fast shutdown request
>
>
> Any ideas how to fix or debug this?
>
> Nico
>
> --
>
> Nico De Ranter
>
> Operations Engineer
>
> T. +32 16 38 72 10
>
>
> <http://www.esaturnus.com>
>
> <http://www.esaturnus.com>
>
>
> eSATURNUS
> Philipssite 5, D, box 28
> 3001 Leuven – Belgium
>
>
>
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com <http://www.esaturnus.com>
>
> ** <http://www.esaturnus.com/>
>
> *For Service & Support :*
>
> Support Line Belgium: +32 2 2009897
>
> Support Line International: +44 12 56 68 38 78
>
> Or via email : medical.services.eu@xxxxxxxx
> <mailto:medical.services.eu@xxxxxxxx>
>
>
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
Nico De Ranter
Operations Engineer
T. +32 16 38 72 10
eSATURNUS | T. +32 16 40 12 82 |
For Service & Support :
Support Line Belgium: +32 2 2009897
Support Line International: +44 12 56 68 38 78
Or via email : medical.services.eu@xxxxxxxx