Re: Large Database \d: ERROR: cache lookup failed for relation ...

Jim Nasby <decibel@xxxxxxxxxxx> · Tue, 5 Jun 2007 23:49:31 -0500

I'm working with these guys to resolve the immediate issue, but I  
suspect there's a race condition somewhere in the code.

What's happened is that OIDs have been changed in the system. There's  
not a lot of table DDL that happens, but there is a substantial  
amount of view DDL that can take place. In a nutshell, tables will  
sometimes have fields added to them, and when that happens a whole  
set of views needs to be re-created to take the new fields into account.

The files for corrupted tables do exist; this seems to be mostly a  
catalog corruption issue. I'm seeing both what appear to be  
inconsistencies between relcache and the catalog tables as well as  
corruption between tables themselves:

emma2=# select * from userdata_8464_campaigns;
ERROR:  could not open relation with OID 138807643
emma2=# \d userdata_8464_campaigns
                                        Table  
"public.userdata_8464_campaigns"
      Column             |            Type              
|                            Modifiers
-------------------------------+----------------------------- 
+------------------------------------------------------------------
campaign_id              | bigint                      | not null  
default nextval(('emma_campaigns_seq'::text)::regclass)
account_id               | bigint                      | not null
cep_object_id                 | bigint                      | not  
null default nextval(('cep_object_seq'::text)::regclass)
campaign_name            | character varying(255)      | not null
campaign_subject         | character varying(255)      | not null
layout_page_id           | bigint                      | not null
layout_content_id        | bigint                      | not null
campaign_create_date     | timestamp without time zone | not null  
default now()
campaign_last_mod_date   | timestamp without time zone | not null  
default now()
campaign_status          | character varying(50)       | not null
campaign_parent_id       | bigint                      |
published_campaign_id    | bigint                      |
campaign_plaintext       | text                        |
campaign_plaintext_ds    | timestamp without time zone |
delivery_old_score       | double precision            |
campaign_person_defaults | text                        |
Inherits: emma_campaigns

select oid from pg_class where relname='userdata_8464_campaigns';
  oid
--------
533438
(1 row)

And that file actually does exist on disk...

select * from pg_index where indexrelid=138807643;
indexrelid | indrelid | indnatts | indisunique | indisprimary |  
indisclustered | indisvalid | indkey | indclass | indexprs | indpred
------------+----------+----------+-------------+-------------- 
+----------------+------------+--------+----------+----------+---------
  138807643 |   533438 |        1 | t           | t            |  
f              | t          | 1      | 1980     |          |
(1 row)

select * from pg_class where oid=138807643;
relname | relnamespace | reltype | relowner | relam | relfilenode |  
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid  
| relhasindex | relisshared | relkind | relnatts | relchecks |  
reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey  
| relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------+--------------+---------+----------+-------+------------- 
+---------------+----------+-----------+--------------- 
+---------------+-------------+-------------+---------+---------- 
+-----------+-------------+----------+----------+--------- 
+------------+------------+-------------+---------------- 
+--------------+--------+------------
(0 rows)

On Jun 5, 2007, at 11:27 AM, Erik Jones wrote:

I originally sent this message from my gmail account yesterday as  
we were having issues with our work mail servers yesterday, but  
seeing that it hasn't made it to the lists yet, I'm resending from  
my registered address.  You have my apologies if you receive this  
twice.

"Thomas F. O'Connell" <tf ( at ) o ( dot ) ptimized ( dot ) com>  
writes:
> I'm dealing with a database where there are ~150,000 rows in

> information_schema.tables. I just tried to do a \d, and it came back
> with this:

> ERROR:  cache lookup failed for relation [oid]

> Is this indicative of corruption, or is it possibly a resource  
issue?

Greetings,

This message is a follow-up to Thomas's message quoted above (we're  
working together on the same database). He received one response  
when he sent the above message which was from Tom Lane and can be  
easily summarized as him having said that that could happen tables  
were being created or dropped while running the \d in psql.  
Unfortunately, that wasn't the case, we have now determined that  
there is some corruption in our database and we are hoping some of  
you back-end gurus might have some suggestions.

How we verified that there is corruption was simply to reindex all  
of our tables in addition to getting the same errors when running a  
dump this past weekend.  We so far have a list of five tables for  
which reindex fails with the error: "ERROR: could not open relation  
with OID xxxx" (sub xxxx with the five different #s) and one that  
fails reindexing with "ERROR: xxxxx is an index" where is an index  
on a completely different table. After dropping all of the indexes  
on these tables (a couple didn't have any to begin with), we still  
cannot run reindex on them. In addition, we can't drop the tables  
either (we get the same errors). We can however run alter table  
statements on them. So, we have scheduled a downtime for an evening  
later this week wherein we plan on bringing the database down for a  
REINDEX SYSTEM and before that we are going to run a dump excluding  
those tables, restore that on a separate machine and see if these  
errors crop up there anywhere. Is there anything else anyone can  
think of that we can do to narrow down where the actual corruption  
is or how to fix it?

Erik Jones

Software Developer | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

---------------------------(end of  
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

--
Jim Nasby                                            jim@xxxxxxxxx
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)