On Tue, Mar 01, 2011 at 01:20:43PM -0800, daveg wrote: > On Tue, Mar 01, 2011 at 12:00:54AM +0200, Heikki Linnakangas wrote: > > On 28.02.2011 23:28, daveg wrote: > > >On Wed, Jan 12, 2011 at 10:46:14AM +0200, Heikki Linnakangas wrote: > > >>We'll likely need to go back and forth a few times with various > > >>debugging patches until we get to the heart of this.. > > > > > >Anything new on this? I'm seeing at on one of my clients production boxes. > > > > I haven't heard anything from the OP since. > > > > >Also, what is the significance, ie what is the risk or damage potential if > > >this flag is set incorrectly? > > > > Sequential scans will honor the flag, so you might see some dead rows > > incorrectly returned by a sequential scan. That's the only "damage", but > > an incorrectly set flag could be a sign of something more sinister, like > > corrupt tuple headers. The flag should never be set incorrectly, so if > > you see that message you have hit a bug in PostgreSQL, or you have bad > > hardware. > > > > This flag is quite new, so a bug in PostgreSQL is quite possible. If you > > still have a backup that contains those incorrectly set flags, I'd like > > to see what the page looks like. > > > I ran vacuums on all the affected tables last night. I plan to take a downtime > to clear the buffer cache and then to run vacuums on all the dbs in the > cluster. > > Most but not all the tables involved are catalogs. > > However, I could probably pick up your old patch sometime next week if it > recurrs and send you page images. After a restart and vacuum of all dbs with no other activity things were quiet for a couple hours and then we started seeing these PD_ALL_VISIBLE messages again. Going back through the logs we have been getting these since at least before mid January. Oddly, this only happens on four systems which are all new Dell 32 core Nehalem 512GB machines using iscsi partitions served off a Netapp. Our older 8 core 64GB hosts have never logged any of these errors. I'm not saying it is related to the hw, as these hosts are doing a lot more work than the old hosts so it may be a concurrency problem that just never came up at lower levels before. Postgresql version is 8.4.4. I'll pick up Heikkis page logging patch and run it for a bit to get some damaged page images. What else could I be doing to track this down? -dg -- David Gould daveg@xxxxxxxxx 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects. -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin