Today,our monitor report a pg slave instance'disk space usage reach 96%,I login in to the machine,and find the pg_xlog dir take up more than 2TB,which is abnormal.
the number of WAL file in the pg_xlog dir is more than 130k,while we set the wal keep number to 8192.
I think there is something stop the replay,so I check the pg_stat_activity,pg_prepare_statement,pg_xact etc,but find all normal.
I run:
ps auxwww | grep postgres
and can find the wal receiver and streaming receiver work happily,because the wal file name,the streaming log id changed.
So I have no idea.
I then restart the slave PG,and find it recover from a very old wal which is one month ago.
We are now set up a new slave for the master while let the recover from this slave go.
the PG version is 9.1.9,OS is CentOS 6 x86-64.
Jov
blog: http:amutu.com/blog