Ward, I've experienced the exact problem you describe. The two machines where identical in every way: make, model, disk layout, OS, etc., and this scenario happens regardless of which machine was the primary and which was the warm-standby. Note I was not running pgAgent. I was using pg_standby to implement copying of WAL files between machines. It would copy the WAL file to a network shared directory, where the warm-standby would pick up the file and use it, until the fatal error you describe happened. I had discovered that during a copy operation Windows will allocate the entire file size on the target prior to completing the file copy. This differs from Unix, and may have something to do with the errors we are seeing. I'm speculating here, but I believe when the recovery code "sees" a 16 Mb file it thinks the entire file contents are available, which is not necessarily the case with Windows. I know some folks recommend rsync, but that requires installing cygwin and my client isn't happy with that idea. Possibly copying the WAL file to a temporary location, then moving it to the target location may mitigate the problem, since move operations (on the same disk drive, anyway) in Windows simpy rejigger the file descriptor and don't reallocate any disk space. I haven't tried it yet, but I'm moving in that direction. Regards, Bob Lunney --- On Tue, 12/2/08, Ward Eaton <Ward.Eaton@xxxxxxxxxxx> wrote: From: Ward Eaton <Ward.Eaton@xxxxxxxxxxx> |