Greg Smith wrote:
On Wed, 13 Jun 2007, Johannes Konert wrote:
If someone corrects the servers computer-time/date to a date before
current time (e.g. set the clock two hours back), then the newer WAL
files will have an older timestamp and will be deleted by accident.
This should never happen; no one should ever touch the clock by hand
on a production system. The primary and backup server should both be
syncronized via NTP. If you're thinking about clock changes for
daylight savings time, those shouldn't have any effect on timestamps,
which should be stored in UTC. If you're on Windows,
Its not Windows; it will be Debian Linux.
I completely agree with you that of course our servers synchronize
themselve via NTP with global time, but we already had the case that -
for some reasons - NTP did not work and times drift away from each
other. If you have to manage some servers you might not recognize that a
NTP daemon does not work anymore or that a new firewall prohibits these
TCP packages now....and time goes by, because everything seem to work
just fine.
Then one nice day you realize, that one, two or many of your servers
just have their own time and you need to bring them back to synchronized
time while they are online. If you made your applications be aware of
such effects and use system-nanotime or global counters where possible,
then even these time-corrections can be handled.
But I agree with you: of course normally this will never happen...but it
happened once.
You're working hard to worry about problems that should be eliminated
by the overall design of your system. If you can't trust your system
clocks and that files are being copied with their attributes intact,
you should consider thinking about how to resolve those problems
rather than working around them.
yes, but still there is a remaining risk in my opinion.
It's not just PostgreSQL that will suffer from weird, unpredictable
behavior in a broken environment like that. Giving a Windows example,
if you're running in a Windows Domain configuration, if the client
time drifts too far from the server you can get "The system cannot log
you on due to the following error: There is a time difference between
the Client and Server." when trying to login.
If we add a new server to the cluster, the application will check times
as it is in oyur Windows-example, but if it is allready in and working,
then it cannot simply shutdown in case of time-diffs.
Greg, thanks for your sophisticated hints.
But the thread is going a little off-topic now, I guess :)
The issue with the time-dependency of WAL archiving and deletion
issolved for me by using a global infinite counter to rely on by now.
I am sure next questions will come before long and I look forward to
read any hints then, if you and others have time to read them.
Regards Johannes