Bricklen Anderson <BAnderson@xxxxxxxxxxxx> writes: >> Tom Lane wrote: >>> But anyway, the evidence seems pretty clear that in fact end of WAL is >>> in the 73 range, and so those page LSNs with 972 and 973 have to be >>> bogus. I'm back to thinking about dropped bits in RAM or on disk. > memtest86+ ran for over 15 hours with no errors reported. > e2fsck -c completed with no errors reported. Hmm ... that's not proof your hardware is ok, but it at least puts the ball back in play. > Any ideas on what I should try next? Considering that this db is not > in production yet, I _do_ have the liberty to rebuild the database if > necessary. Do you have any further recommendations? If the database isn't too large, I'd suggest saving aside a physical copy (eg, cp or tar dump taken with postmaster stopped) for forensic purposes, and then rebuilding so you can get on with your own work. One bit of investigation that might be worth doing is to look at every single 8K page in the database files and collect information about the LSN fields, which are the first 8 bytes of each page. In a non-broken database all of these should be less than or equal to the current ending WAL offset (which you can get with pg_controldata if the postmaster is stopped). We know there are at least two bad pages, but are there more? Is there any pattern to the bad LSN values? Also it would be useful to look at each bad page in some detail to see if there's any evidence of corruption extending beyond the LSN value. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)