2011/7/17 Ken Caruso <ken@xxxxxxxxx>: > > > On Sat, Jul 16, 2011 at 2:30 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> >> Ken Caruso <ken@xxxxxxxxx> writes: >> > Sorry, the actual error reported by CLUSTER is: >> >> > gpup=> cluster verbose tablename; >> > INFO: clustering "dbname.tablename" >> > WARNING: could not write block 12125253 of base/2651908/652397108 >> > DETAIL: Multiple failures --- write error might be permanent. >> > ERROR: could not open file "base/2651908/652397108.1" (target block >> > 12125253): No such file or directory >> > CONTEXT: writing block 12125253 of relation base/2651908/652397108 >> >> Hmm ... it looks like you've got a dirty buffer in shared memory that >> corresponds to a block that no longer exists on disk; in fact, the whole >> table segment it belonged to is gone. Or maybe the block or file number >> in the shared buffer header is corrupted somehow. >> >> I imagine you're seeing errors like this during each checkpoint attempt? > > Hi Tom, > Thanks for the reply. > Yes, I tried a pg_start_backup() to force a checkpoint and it failed due to > similar error. > >> >> I can't think of any very good way to clean that up. What I'd try here >> is a forced database shutdown (immediate-mode stop) and see if it starts >> up cleanly. It might be that whatever caused this has also corrupted >> the back WAL and so WAL replay will result in the same or similar error. >> In that case you'll be forced to do a pg_resetxlog to get the DB to come >> up again. If so, a dump and reload and some manual consistency checking >> would be indicated :-( > > Before seeing this message, I restarted Postgres and it was able to get to a > consistent state at which point I reclustered the db without error and > everything appears to be fine. Any idea what caused this? Was it something > to do with the Vacuum Full? Block number 12125253 is bigger that any block we can find in base/2651908/652397108.1 Should the table size be in the 100GB range or 2-3 GB range ? This should help decide: if in the former case, then probably at least a segment disappear or, in the later, the shared_buffer turn corrupted. Ken, you didn't change RELSEG_SIZE, right ? (it needs to be change in source code before compile it yourself) In both case a hardware check is welcome I believe. -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ ; PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin