On Wed, Sep 29, 2010 at 10:15 AM, Nigel <nigelspleen@xxxxxxxxx> wrote: > Hello, > > We're running PG 8.3 in a warm standby configuration. About 3 weeks ago we > had to fail over from the primary to the standby. That worked fine, but > we're having problems getting standby mode set up again. On the new > standby, everything works fine for a little while: WALs were rsynced over > and processed correctly as far as I can tell. But every 65-75 minutes (very > regularly), a WAL file is copied that's actually a symlink. When the > standby tries to read the rsynced symlink, it hangs indefinitely, presumably > because the target of the link doesn't exist on the standby. > > In the primary's pg_xlog, I see the expected WAL files with increasing > numbers and recent modification dates, but every 65-75 files there's one of > these symlinks. For example: > > Sep 28 16:13 0000000300000A5C00000070 > Sep 28 16:15 0000000300000A5C00000071 > Sep 28 16:12 0000000300000A5C00000072 > Sep 5 01:00 0000000300000A5C00000073 -> > /srv/db/chdbprod_wal_archives/00000001000009D6000000D6 > Sep 28 16:21 0000000300000A5C00000074 > Sep 28 16:19 0000000300000A5C00000075 > > The "/srv/db/chdbprod_wal_archives" directory is where incoming WAL files > used to go, back when the current primary server was the standby. The > September 5 date you see above is shortly before the failover was done. It > confused me at first until I remembered that it's the mod date of the target > of the symlink, not the link itself (which in this case was presumably > created around 16:20). The target of the symlinks is always the same. > > pg_xlog also contains a 00000003.history file, which references the target > of the symlinks. Here's its contents: > > 1 00000001000009D6000000D6 before transaction 0 at 2000-01-01 > 00:00:00+00 > > I gather that my problems here are due to having a primary server that was > itself formerly a standby, but I'm not sure what action to take. I don't > know enough about how the history files work and what the significance of > the symlinks is. What purpose to the symlinks serve? Why are they > recreated regularly at slighly more than hourly intervals? Why do they > point to a directory that was only used back when the primary was a > standby? (If it makes any difference, back when the primary server was a > standby, it was running pg_standby with the -l option.) Does their presence > mean that something's wrong on the primary, or should they be ignored when > copying to the standby? I guess that the cause is -l option. The symlink to the archived WAL file is created in pg_xlog by "pg_standby -l". At the failover, unfortunately that symlink in pg_xlog is renamed to the new for WAL recycling. Then, the symlink to old archived WAL file remains in pg_xlog. AFAIR, because of this problem, -l option was removed from pg_standby. http://archives.postgresql.org/pgsql-committers/2009-06/msg00323.php Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin