I'm working with these guys to resolve the immediate issue, but I
suspect there's a race condition somewhere in the code.
What's happened is that OIDs have been changed in the system. There's
not a lot of table DDL that happens, but there is a substantial
amount of view DDL that can take place. In a nutshell, tables will
sometimes have fields added to them, and when that happens a whole
set of views needs to be re-created to take the new fields into account.
The files for corrupted tables do exist; this seems to be mostly a
catalog corruption issue. I'm seeing both what appear to be
inconsistencies between relcache and the catalog tables as well as
corruption between tables themselves:
emma2=# select * from userdata_8464_campaigns;
ERROR: could not open relation with OID 138807643
emma2=# \d userdata_8464_campaigns
Table
"public.userdata_8464_campaigns"
Column | Type
| Modifiers
-------------------------------+-----------------------------
+------------------------------------------------------------------
campaign_id | bigint | not null
default nextval(('emma_campaigns_seq'::text)::regclass)
account_id | bigint | not null
cep_object_id | bigint | not
null default nextval(('cep_object_seq'::text)::regclass)
campaign_name | character varying(255) | not null
campaign_subject | character varying(255) | not null
layout_page_id | bigint | not null
layout_content_id | bigint | not null
campaign_create_date | timestamp without time zone | not null
default now()
campaign_last_mod_date | timestamp without time zone | not null
default now()
campaign_status | character varying(50) | not null
campaign_parent_id | bigint |
published_campaign_id | bigint |
campaign_plaintext | text |
campaign_plaintext_ds | timestamp without time zone |
delivery_old_score | double precision |
campaign_person_defaults | text |
Inherits: emma_campaigns
select oid from pg_class where relname='userdata_8464_campaigns';
oid
--------
533438
(1 row)
And that file actually does exist on disk...
select * from pg_index where indexrelid=138807643;
indexrelid | indrelid | indnatts | indisunique | indisprimary |
indisclustered | indisvalid | indkey | indclass | indexprs | indpred
------------+----------+----------+-------------+--------------
+----------------+------------+--------+----------+----------+---------
138807643 | 533438 | 1 | t | t |
f | t | 1 | 1980 | |
(1 row)
select * from pg_class where oid=138807643;
relname | relnamespace | reltype | relowner | relam | relfilenode |
reltablespace | relpages | reltuples | reltoastrelid | reltoastidxid
| relhasindex | relisshared | relkind | relnatts | relchecks |
reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey
| relhasrules | relhassubclass | relfrozenxid | relacl | reloptions
---------+--------------+---------+----------+-------+-------------
+---------------+----------+-----------+---------------
+---------------+-------------+-------------+---------+----------
+-----------+-------------+----------+----------+---------
+------------+------------+-------------+----------------
+--------------+--------+------------
(0 rows)
On Jun 5, 2007, at 11:27 AM, Erik Jones wrote:
I originally sent this message from my gmail account yesterday as
we were having issues with our work mail servers yesterday, but
seeing that it hasn't made it to the lists yet, I'm resending from
my registered address. You have my apologies if you receive this
twice.
"Thomas F. O'Connell" <tf ( at ) o ( dot ) ptimized ( dot ) com>
writes:
> I'm dealing with a database where there are ~150,000 rows in
> information_schema.tables. I just tried to do a \d, and it came back
> with this:
> ERROR: cache lookup failed for relation [oid]
> Is this indicative of corruption, or is it possibly a resource
issue?
Greetings,
This message is a follow-up to Thomas's message quoted above (we're
working together on the same database). He received one response
when he sent the above message which was from Tom Lane and can be
easily summarized as him having said that that could happen tables
were being created or dropped while running the \d in psql.
Unfortunately, that wasn't the case, we have now determined that
there is some corruption in our database and we are hoping some of
you back-end gurus might have some suggestions.
How we verified that there is corruption was simply to reindex all
of our tables in addition to getting the same errors when running a
dump this past weekend. We so far have a list of five tables for
which reindex fails with the error: "ERROR: could not open relation
with OID xxxx" (sub xxxx with the five different #s) and one that
fails reindexing with "ERROR: xxxxx is an index" where is an index
on a completely different table. After dropping all of the indexes
on these tables (a couple didn't have any to begin with), we still
cannot run reindex on them. In addition, we can't drop the tables
either (we get the same errors). We can however run alter table
statements on them. So, we have scheduled a downtime for an evening
later this week wherein we plan on bringing the database down for a
REINDEX SYSTEM and before that we are going to run a dump excluding
those tables, restore that on a separate machine and see if these
errors crop up there anywhere. Is there anything else anyone can
think of that we can do to narrow down where the actual corruption
is or how to fix it?
Erik Jones
Software Developer | Emma®
erik@xxxxxxxxxx
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
---------------------------(end of
broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Jim Nasby jim@xxxxxxxxx
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)