Re: Postgres error: could not open relation base/xxxxx/yyyyy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Whoever did the memory installation, should also have taken care of checking it's working ok.
A good idea is to use ECC memory.
Does fsck on the remote system run with reporting problems as well?? (it might report problems such as incorrect block counts, missing inodes etc...)

Also check for disk, controller problems.

ÎÏÎÏ Monday 15 November 2010 14:42:32 Î/Î Pablo Delgado DÃaz-Pache ÎÎÏÎÏÎ:
> Hi Achilleas,
> 
> Thanks for your quick response.
> We do have backups and a couple of spare servers running in parallel so
> we're safe in that sense. Thanks for your advice anyway.
> 
> To run *memtest86 *we have to go to the datacenter and that will take us a
> few days. Is there anything we can do remotely?
> 
> Could it be anything different than memory HW problem?
> Btw, we are not using that server at the moment :)
> 
> Thanks
> 
> On Mon, Nov 15, 2010 at 12:03 PM, Achilleas Mantzios <
> achill@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> > Stop using the system immediately, since many things inserted to the DB
> > might simply be garbage.
> > Inspect your memory with memtest86.
> >
> > I would even suggest moving to a new HW if available, and start working
> > into two parallel directions:
> >
> > a) try to bring your DB into a sane state
> > b) try to fix your HW server, which apparently has problems.
> >
> > Files are not "disappearing" like that, in normal situations.
> > You cannot trust your current HW to run postgresql. Its a guarantee method
> > to lose your data, if you
> > go on by this start/stop/restore/pray circle.
> >
> > IMPORTANT: Handle your old (pre-memory upgrade) backups like expensive
> > chinese vases.
> > Start from those good and trustworthy backups to do any database "replay"
> > (or similar) work which
> > would lead to a viable situation.
> >
> > ÎÏÎÏ Monday 15 November 2010 11:55:38 Î/Î Pablo Delgado DÃaz-Pache ÎÎÏÎÏÎ:
> > > Hi all,
> > >
> > > We've been using postgres for 9 years without a problem until now! Two
> > > problems in a very short time!
> > > The first one is described in
> > >
> > http://postgresql.1045698.n5.nabble.com/Autovacuum-seems-to-block-database-WARNING-worker-took-too-long-to-start-td3264261.html
> > > This is another one (not related I think) ...
> > >
> > > Postgres server is usually working fine. All of a sudden we start getting
> > > these errors ...
> > >
> > > * **2010-11-09 11:49:15.320
> > > CET|2|database1|10.19.0.51(18895)|20929|SELECT|4cd926fd.51c1|2010-11-09
> > > 11:48:29 CET|10/417796|1390150|postgres| LOG:  duration: 1518.422 ms
> > >  execute <unnamed>: SELECT id_token_fk,xxxxxxxxx ORDER BY avadate*
> > > * **2010-11-09 11:52:25.364
> > > CET|1|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > > 11:51:59 CET|7/430041|0|postgres| ERROR:  could not open relation
> > > base/273198960/273198979: No such file or directory*
> > > * **2010-11-09 11:52:25.364
> > > CET|2|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > > 11:51:59 CET|7/430041|0|postgres| STATEMENT:  SELECT
> > id_token_fkxxxxxxxxxxx
> > >  ORDER BY avadate*
> > > * **2010-11-09 11:52:29.981
> > > CET|3|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > > 11:51:59 CET|7/430049|0|postgres| ERROR:  could not open relation
> > > base/273198960/273199235: No such file or directory*
> > > * **2010-11-09 11:52:30.988
> > > CET|6|database1|10.19.0.51(23286)|21566|PARSE|4cd927cf.543e|2010-11-09
> > > 11:51:59 CET|7/430050|0|postgres| STATEMENT:  SELECT max(avadate)
> > xxxxxxxx
> > > 32036)*
> > > * **2010-11-09 11:53:36.346
> > > CET|16|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> > > 11:53:02 CET|94/516004|0|postgres| STATEMENT:  SELECT * FROM "photos"
> > > xxxxxxxxxxxxxx LIMIT 1*
> > > * **2010-11-09 11:53:37.956
> > > CET|17|database2|10.19.0.42(44916)|22107|SELECT|4cd9280e.565b|2010-11-09
> > > 11:53:02 CET|94/516025|0|postgres| ERROR:  could not open relation
> > > base/271253899/271254075: No such file or directory*
> > > * **................*
> > > * **................*
> > > * **2010-11-09 11:53:55.560 CET|111|||26090||4cc6e970.65ea|2010-10-26
> > > 16:45:04 CEST||0|| ERROR:  could not open relation
> > base/273198960/273199235:
> > > No such file or directory*
> > > * **2010-11-09 11:53:55.560 CET|112|||26090||4cc6e970.65ea|2010-10-26
> > > 16:45:04 CEST||0|| CONTEXT:  writing block 8866 of relation
> > > base/273198960/273199235*
> > > * **2010-11-09 11:53:55.560 CET|113|||26090||4cc6e970.65ea|2010-10-26
> > > 16:45:04 CEST||0|| WARNING:  could not write block 8866 of
> > > base/273198960/273199235*
> > > * **2010-11-09 11:53:55.560 CET|114|||26090||4cc6e970.65ea|2010-10-26
> > > 16:45:04 CEST||0|| DETAIL:  Multiple failures --- write error might be
> > > permanent.*
> > > * **2010-11-09 11:53:56.590 CET|115|||26090||4cc6e970.65ea|2010-10-26
> > > 16:45:04 CEST||0|| ERROR:  could not open relation
> > base/273198960/273199235:
> > > No such file or directory*
> > > * **2010-11-09 11:53:56.590 CET|116|||26090||4cc6e970.65ea|2010-10-26
> > > 16:45:04 CEST||0|| CONTEXT:  writing block 8866 of relation
> > > base/273198960/273199235*
> > >
> > > See that there are 2 different databases involved. (database1 and
> > > database2).
> > >
> > > Looking for distinct errors (among the many we have in the log) I find
> > there
> > > are only 4 involved ...
> > >
> > > * **base/271253899/271254075*
> > > * **base/273198960/273198979*
> > > * **base/273198960/273199235*
> > > * **base/273198960/273199253*
> > >
> > > and those files are not in the postgres base directory.
> > >
> > > To fix it we have no option but to restart postgres (which restarts fine
> > > with a /etc/init.d/postgresql stop & start)
> > > However, once we restarted postgres some data was corrupted. Tables that
> > > used to have 4,5 million rows had only 60 rows. As a consequence we had
> > to
> > > restore from file system backup.
> > > Once we did that, it worked fine for a few days until it happened again.
> > > We're worried it can happen again!
> > >
> > > Could this error be a hardware problem?
> > > We recently increased the memory from 8GB to 28GB, although it was
> > working
> > > fine for more than 3 weeks.
> > > We also recently upgraded from postgres 8.3.6 to 8.4.5, althought it also
> > > worked fine for a few months.
> > > Upgrading to postgres 9 is easy for us. however, not sure that would
> > help.
> > >
> > > Some info of our server:
> > >
> > > OS: Centos 5.5
> > > Kernel: 2.6.18-194.1.el5
> > > Postgres version: 8.4.5 (installation out-of-the-box using yum)
> > > Server memory: 28GB
> > >
> > >
> > > Any help would be appreciated
> > >
> > > Pablo
> > >
> >
> >
> >
> > --
> > Achilleas Mantzios
> >
> > --
> > Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-admin
> >
> 



-- 
Achilleas Mantzios

-- 
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux