Re: cache lookup failed for index

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Kevin,

Thanks for the response.

Now we're getting somewhere.  The disk drive "became corrupt" while
PostgreSQL was running?  Was the drive unmounted or remounted while
PostgreSQL was running, or did you stop PostgreSQL first?  Do you
have any errors in the PostgreSQL log from the time this was all
going on?

The failure basically happened because the Django webapp we're running isn't effectively closing database connections. So, memory is completely filling up and causing the server to hang. Yesterday, when this happened it caused the entire network interface to become inoperable which meant that the iscsi connection to the shared drive stopped working and data became corrupt.

I stopped the postgresql service before unmounting and remounting the target.

My first concern is restoring the database. I'll fix the problems with django and apache later. I can deal with those problems. I'm also going to create a series of database backups that can be used to quickly restore data if this happens again. My concern is simply just getting this back to baseline.

One more question occurs to me -- it seems unusual for someone to be
running on a single disk with no RAID and no backup, but to be
running with a version of PostgreSQL with is only about a month old.
Was 8.1.21 the version you were running at the time of the failure,
or have you upgraded during the recovery attempt?  If you've
upgraded, the version in use when the corruption occur

This storage server has RAID and there are backups, it just so happens that the most recent usable backup is from June 20th. I completely forgot to configure the backups on this server. I normally wouldn't make this mistake, but I did this time.

On the version, this is the version that comes standard with CentOS 5.5. This was a clean CentOS 5.5 install and it's been live for about a month.

>> Also, it would help a lot to know what your postgresql.conf file
>> contains (excluding all comments).

The only uncommented lines are:
max_connections = 500
shared_buffers = 4000
redirect_stderr = on
log_directory = 'pg_log'               
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 0
redirect_stderr = on
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

I can't, in good conscience, recommend any recovery attempts until
you confirm that you have a copy to restore if the cleanup effort
misfires.

I have a full backup of the entire directory structure I took shortly after the database became unusable.

On Wed, Jun 30, 2010 at 10:14 AM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote:
Nathan Robertson <nathan.robertson@xxxxxxxxx> wrote:

> There was a cascade effect. Apache failed which caused the server
> overall to fail. The data is stored on an iSCSI drive and the
> mount of the iSCSI drive became corrupt when everything failed. I
> was able to remount the drive and get access to data now I have
> this index error.

Now we're getting somewhere.  The disk drive "became corrupt" while
PostgreSQL was running?  Was the drive unmounted or remounted while
PostgreSQL was running, or did you stop PostgreSQL first?  Do you
have any errors in the PostgreSQL log from the time this was all
going on?

Also, how confident are you that the Apache failure caused the drive
to be corrupted?  That sounds *much* less likely than the other way
around.  Without understanding that better, fixing one particular
problem in the database on this machine might be like rearranging
deck chairs on a sinking ship.

> So, this is where I'm at. If anyone could help resolve the index
> cache error I would be eternally great full.

We'd like to help, and perhaps someone else can suggest something on
the basis of information you've provided so far, but I'm not
comfortable suggesting something without a little more of a sense of
what happened and what your configuration is.

>> Also, it would help a lot to know what your postgresql.conf file
>> contains (excluding all comments).

This would still be useful.

>> But first and foremost, you should make a file-copy backup of
>> your entire PostgreSQL data directory tree with the PostgreSQL
>> server stopped, if you haven't done that already.  Any attempt at
>> recovery may misfire, and you might want to get back to what you
>> have now.

I can't, in good conscience, recommend any recovery attempts until
you confirm that you have a copy to restore if the cleanup effort
misfires.
red could be
relevant.
One more question occurs to me -- it seems unusual for someone to be
running on a single disk with no RAID and no backup, but to be
running with a version of PostgreSQL with is only about a month old.
Was 8.1.21 the version you were running at the time of the failure,
or have you upgraded during the recovery attempt?  If you've
upgraded, the version in use when the corruption occur

-Kevin


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux