Re: could not create lock file postmaster.pid: No such file or directory, but file does exist

Rob Goethals / SNP <Rob.Goethals@xxxxxx> · Mon, 17 Feb 2014 14:42:17 +0100

> -----Oorspronkelijk bericht-----
> Van: Albe Laurenz [mailto:laurenz.albe@xxxxxxxxxx]
> Verzonden: maandag 17 februari 2014 14:22
> Aan: Rob Goethals
> Onderwerp: RE: could not create lock file postmaster.pid: No such file or
> directory, but file does exist
> 
> Dear Rob,
> 
> you should send your reply to the list.
> This way
> a) people know that your problem is solved and won't spend their time trying
> to help you.
> b) others can benefit from the information.

OK, clear. I hereby send this reply also to the list.

> 
> >>> This weekend my database crashed while importing some
> >>> Openstreetmapdata and I can’t get it back to work again. It happened
> >>> before and normally I would reset the WAL-dir with the pg_resetxlog
> >> command. I would loose some data but that would be all.
> >>
> >> That is not a good idea.  PostgreSQL should recover from a crash
> >> automatically.
> >> If you run pg_resetxlog your database cluster is damaged, and all you
> >> should do is pg_dump all the data you can, run initdb and import the data.
> >
> > But what if Postgresql doesn't recover automatically? When my database
> > crashed and I try to restart it, I most of the time get a message like:
> > LOG:  could not open file "pg_xlog/0000000100000114000000D2" (log file
> > 276, segment 210): No such file or directory
> > LOG:  invalid primary checkpoint record
> > LOG:  invalid secondary checkpoint link in control file
> > PANIC:  could not locate a valid checkpoint record
> > LOG:  startup process (PID 3604) was terminated by signal 6: Aborted
> > LOG:  aborting startup due to startup process failure
> 
> Interesting.
> How did you get PostgreSQL into this state?  Did you set fsync=off or similar?
> Which storage did you put pg_xlog on?
> 

I am adding OSM-changefiles to my database with the command:
osm2pgsql --append --database $database --username $user --slim --cache 3000 --number-processes 6 --style /usr/share/osm2pgsql/default.style --extra-attributes changes.osc.gz

At the moment of the crash the postgresql-log says:
2014-02-15 00:49:04 CET  LOG:  WAL writer process (PID 1127) was terminated by signal 6: Aborted
2014-02-15 00:49:04 CET  LOG:  terminating any other active server processes
2014-02-15 00:49:04 CET [unknown] WARNING:  terminating connection because of crash of another server process
2014-02-15 00:49:04 CET [unknown] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

So what exactly is happening, I don't know. 

When it is trying to startup again this is the logfile output:
2014-02-15 00:49:08 CET  LOG:  could not open temporary statistics file "global/pgstat.tmp": Input/output error
2014-02-15 00:49:14 CET  LOG:  all server processes terminated; reinitializing
2014-02-15 00:49:17 CET  LOG:  database system was interrupted; last known up at 2014-02-15 00:32:01 CET
2014-02-15 00:49:33 CET [unknown] [unknown]LOG:  connection received: host=[local]
2014-02-15 00:49:33 CET [unknown] FATAL:  the database system is in recovery mode
2014-02-15 00:49:56 CET  LOG:  database system was not properly shut down; automatic recovery in progress
2014-02-15 00:49:57 CET [unknown] [unknown]LOG:  connection received: host=[local]
2014-02-15 00:49:57 CET [unknown] FATAL:  the database system is in recovery mode
2014-02-15 00:50:01 CET  LOG:  redo starts at 114/C8B27330
2014-02-15 00:50:02 CET  LOG:  could not open file "pg_xlog/0000000100000114000000CB" (log file 276, segment 203): No such file or directory
2014-02-15 00:50:02 CET  LOG:  redo done at 114/CAFFFF80
2014-02-15 00:50:02 CET  LOG:  checkpoint starting: end-of-recovery immediate
2014-02-15 00:50:05 CET  PANIC:  could not create file "pg_xlog/xlogtemp.5390": Input/output error
2014-02-15 00:50:22 CET [unknown] [unknown]LOG:  connection received: host=[local]
2014-02-15 00:50:22 CET [unknown] FATAL:  the database system is in recovery mode
2014-02-15 00:50:23 CET  LOG:  startup process (PID 5390) was terminated by signal 6: Aborted
2014-02-15 00:50:23 CET  LOG:  aborting startup due to startup process failure

Furthermore I checked my conf-file and my fsync is indeed set to off.
I mounted a directory on a NTFS network-disk (because of the available size and considering the amount of OSM-data is pretty big). This is where I put all my database data, so also the pg_xlog.

> > Is there a better procedure to follow when something like this
> > happens? I am fairly new at the whole Postgresql thing so I am very
> > willing to learn all about it anyway I can from experienced users. I
> > am googling all my way round the internet to try and solve all the
> > questions I have, but as with many things there's most of the time more
> than 1 answer to a problem and for me it is very hard to figure out what is the
> best solution.
> 
> No, in that case I would restore from a backup.
> 
> >> One wild guess: could it be that the OS automatically remounted the
> >> file system read-only because it encountered a problem?  Check your
> >> /var/log/messages (I hope the location is the same on Ubuntu and on
> RHEL).
> >> In that case unmount, fsck and remount should solve the problem.
> >
> > I am impressed. Your wild guess exactly did the trick. Manually
> > unmounting, checking and remounting was all it needed. Thank you very
> much!!
> 
> That would suggest that you have a hardware problem with your storage.
> It may be that your file system is corrupted.  Did you fsck it?

The fsck didn't work as it was mounted as cifs. So I guess I should let Windows do the checking.

> 
> Yours,
> Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general