Update - someone unleashed a 'cleanup script' yesterday via puppet to multiple hosts and greedily deleted files that had not been modified in 15 days. This is the most likely culprit so mystery basically solved. Thankfully this is in QA, whew! It would be interesting to still know if there are ways of having postgres check and verify that files it expects to find are there, and to get an idea on the extent of the damage.
On Fri, Oct 4, 2013 at 12:10 PM, Mike Broers <mbroers@xxxxxxxxx> wrote:
Strange, this is happening in a totally different environment now too. The only thing these two environments share is a SAN, but I wouldnt think something going on at the SAN level would make files disappear. Any suggestions are greatly appreciated.On Fri, Oct 4, 2013 at 9:40 AM, Mike Broers <mbroers@xxxxxxxxx> wrote:Hello, our postgresql 9.2.4 qa database (thankfully its just qa) seems to be hosed.Starting at around 3:39am last night I started seeing errors about missing files and now I cannot run a pgdump or a vacuum without it complaining about files that it cannot find with errors like this: ERROR: could not open file "base/125542/12631". When I check the filesystem the files are indeed not there. The 1am regular vacuum completed and its log is clean. The postgres log is clean before these errors occurred.Since this is qa we do not perform backups, and the solution if we cannot repair the problem will be to create a fresh qa server but I am intrigued about how to determine the source of the problem and the extent of the problem.Is there a way to force vacuum to continue on errors or an alternate way to help determine all the missing files?It might be totally unrelated, but yesterday morning on this qa server I stopped postgres, and created a symlink to pg_xlog so that it was writing to a different volume, and restarted. This was working fine all day so its possibly a red herring but I thought I should mention it.Any advice is appreciated, thanks!