I've been seeing the following error in one database of ours:
"cache lookup failed for relation 7640518"
The SQL that apparently triggers this is:
drop table if exists ns_e5461ae570429d0b7863cce9ef4d4ead;
Unfortunately, manual attempts to reproduce the issue have failed. In normal operation, this statement is run as one of several parallel queries, and the tables are by nature, short lived. That said, they are not temporary tables.
This is one of two very similar databases, and we are running the same software (same version) on top of each. The databases are in different versions of postgresql. Db #1 is postgresql 9.2.3, and db #2 (the one exhibiting the above behavior) is postgresql 9.0.11.
One other item of note: db #2 has recently had an OID wrap-around, which makes me suspect that plays some part in this behavior. I've looked at the caching code in postgresql, and though I cannot claim to have a thorough understanding of how it works, I have a theory.
How the cache works (as I understand it):
When a cache lookup is performed, tuples are looked up by OID. The cache contents are hashed into buckets. If an item is found in the cache, it is promoted to the top of the bucket so that subsequent searches are faster. If an item is not in the cache, it is looked up in the system catalog, and an entry is inserted into the cache. If a lookup in the catalog fails, a negative entry is added to the cache for the tuple. Multiple entries can exist for the same tuple. The latest one is just promoted to the top of the bucket, and the other gets aged out of the cache, since it is never again accessed.
Theory:
Given that we have wrapped around our OID counter, it is possible to have multiple entries in the cache for the same OID. If one relation is deleted, and a negative entry inserted into the cache, attempts to look up the other may erroneously produce a negative cache hit, yielding our "cache lookup failed for relation" error.
Is this a possibility? Are there any other obvious explanation for this? The results from google related to this error seem to point to catalog corruption, or a postgres bug.
Any pointers/enlightenment would be appreciated.
-davidc
David Clymer
VistaShare
866-828-4782, ext. 828
www.VistaShare.com
www.facebook.com/vistashare
www.twitter.com/vistashare