Re: Emergency - Need assistance

warren little <warren.little@xxxxxxxxxxxxxxxxxxx> · Mon, 02 Jan 2006 15:26:12 -0700

Sorry,
forget the attachment.

On Mon, 2006-01-02 at 15:24 -0700, warren little wrote:
> The dump/restore failed even with the zero_damaged_pages=true.
> The the logfile (postgresql-2006-01-02_130023.log)
> did not have much in the way of useful info. I've attached the section
> of the logfile around the time of the crash.  I cannot find any sign of
> a core file.  Where might the core dump have landed?
> 
> Regarding your comments about losing the evidence, the data I'm trying
> to load is in another database in the same cluster which I have no
> intention of purging until a can get the table moved to the new
> database.
> 
> thanks
> 
> 
> 
> 
> On Mon, 2006-01-02 at 16:34 -0500, Tom Lane wrote:
> > warren little <warren.little@xxxxxxxxxxxxxxxxxxx> writes:
> > >  pg_dump: SQL command failed
> > > pg_dump: Error message from server: server closed the connection
> > > unexpectedly
> > >         This probably means the server terminated abnormally
> > >         before or while processing the request.
> > > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
> > 
> > Hmm.  This could mean corrupted data files, but it's hard to be sure
> > without more info.
> > 
> > > I had removed all the files in pg_log prior to getting this error and no
> > > new logfile was created.  I'm guessing I screwed up the logger when
> > > removing all the files, but I assumed that when writing to the error
> > > logs the backend would create a file if one did not exist.
> > 
> > The file *does* exist, there's just no directory link to it anymore :-(
> > You need to force a logfile rotation, which might be most easily done by
> > stopping and restarting the postmaster.
> > 
> > What you need to do is see the postmaster log entry about the backend
> > crash.  If it's dying on a signal (likely sig11 = SEGV) then inspecting
> > the core file might yield useful information.
> > 
> > > I currently attempt to run the dump/restore with the zero_damaged_pages
> > > turned on to see if the results yield something more useful.  
> > 
> > That really ought to be the last resort not the first one, because it
> > will destroy not only data but most of the evidence about what went
> > wrong...
> > 
> > 			regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
>        message can get through to the mailing list cleanly
@ 2006-01-02 15:02:02 MST:LOG:  autovacuum: processing database "tigris"
@ 2006-01-02 15:03:01 MST:LOG:  server process (PID 28772) was terminated by signal 11
@ 2006-01-02 15:03:01 MST:LOG:  terminating any other active server processes
[local]@[local] 2006-01-02 15:03:01 MST:WARNING:  terminating connection because of crash of another server process
[local]@[local] 2006-01-02 15:03:01 MST:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
[local]@[local] 2006-01-02 15:03:01 MST:HINT:  In a moment you should be able to reconnect to the database and repeat your command.
192.168.19.129(50732)@192.168.19.129 2006-01-02 15:03:01 MST:WARNING:  terminating connection because of crash of another server process
192.168.19.129(50732)@192.168.19.129 2006-01-02 15:03:01 MST:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
192.168.19.129(50732)@192.168.19.129 2006-01-02 15:03:01 MST:HINT:  In a moment you should be able to reconnect to the database and repeat your command.
192.168.19.129(50730)@192.168.19.129 2006-01-02 15:03:01 MST:WARNING:  terminating connection because of crash of another server process
192.168.19.129(50730)@192.168.19.129 2006-01-02 15:03:01 MST:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
192.168.19.129(50730)@192.168.19.129 2006-01-02 15:03:01 MST:HINT:  In a moment you should be able to reconnect to the database and repeat your command.
192.168.19.129(50731)@192.168.19.129 2006-01-02 15:03:01 MST:WARNING:  terminating connection because of crash of another server process
192.168.19.129(50731)@192.168.19.129 2006-01-02 15:03:01 MST:DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
192.168.19.129(50731)@192.168.19.129 2006-01-02 15:03:01 MST:HINT:  In a moment you should be able to reconnect to the database and repeat your command.
@ 2006-01-02 15:03:01 MST:LOG:  all server processes terminated; reinitializing
@ 2006-01-02 15:03:01 MST:LOG:  database system was interrupted at 2006-01-02 15:02:47 MST
@ 2006-01-02 15:03:01 MST:LOG:  checkpoint record is at 37/D60F93A8
@ 2006-01-02 15:03:01 MST:LOG:  redo record is at 37/D6008018; undo record is at 0/0; shutdown FALSE
@ 2006-01-02 15:03:01 MST:LOG:  next transaction ID: 32196280; next OID: 102041945
@ 2006-01-02 15:03:01 MST:LOG:  next MultiXactId: 41; next MultiXactOffset: 93
@ 2006-01-02 15:03:01 MST:LOG:  database system was not properly shut down; automatic recovery in progress
@ 2006-01-02 15:03:01 MST:LOG:  redo starts at 37/D6008018
@ 2006-01-02 15:03:01 MST:LOG:  record with zero length at 37/D60F93F8
@ 2006-01-02 15:03:01 MST:LOG:  redo done at 37/D60F93A8
@ 2006-01-02 15:03:02 MST:LOG:  database system is ready
@ 2006-01-02 15:03:02 MST:LOG:  transaction ID wrap limit is 1087118600, limited by database "cert"