Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)

Sim Zacks <sim@xxxxxxxxxxxxxx> · Wed, 21 May 2008 15:36:22 +0300

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you have a backup, the easiest way would be to restore it. There is
also a way to run the database logfile into the database from a point in
time (ie. from the time f last backup) so that you can get your data.
I've never actually seen it work though.

Peter Petrov wrote:
> Hi,
> 
> Today one of the disk was marked as as failed .... and now some files
> are corrupted.
> I've decided to copy the pgsqldata directory and try to fix PG_VERSION
> (see below for information - what PostgreSQL don't like) files ... and
> see if the database will come up.
> During copying files and etc. I'll be open for any other idea how to
> deal with the problem ;)
> 
> PostgreSQL's log offer me to run initdb (HINT message from LOG file) -
> what will happen if then I try to copy the rest ot the structure into
> the newly created database cluster ?
> 
> linux (Slackware 12.0.0), software RAID5 (partition based) + PostgreSQL
> 8.3.0:
> 
> Here's what happen (from dmesg):
> 
> ---------------------------------------
> # uname -a
> Linux xeonito 2.6.21.5 #3 SMP Tue Oct 2 16:20:48 EEST 2007 i686 Intel(R)
> Xeon(R) CPU           E5335  @ 2.00GHz GenuineIntel GNU/Linux
> 
> ---------------------------------------
> # dmesg
> sd 0:0:3:0: SCSI error: return code = 0x08000002
> sdd: Current: sense key=0x4
>    ASC=0x44 ASCQ=0x0
> Info fld=0x0
> end_request: I/O error, dev sdd, sector 159620863
> sd 0:0:3:0: SCSI error: return code = 0x08000002
> sdd: Current: sense key=0x4
>    ASC=0x44 ASCQ=0x0
> Info fld=0x0
> end_request: I/O error, dev sdd, sector 159617119
> raid5: Disk failure on sdd1, disabling device. Operation continuing on 4
> devices
> ......
> 
> RAID5 conf printout:
> --- rd:5 wd:4
> disk 0, o:1, dev:sdb1
> disk 1, o:1, dev:sdc1
> disk 2, o:0, dev:sdd1
> disk 3, o:1, dev:sde1
> disk 4, o:1, dev:sdf1
> RAID5 conf printout:
> --- rd:5 wd:4
> disk 0, o:1, dev:sdb1
> disk 1, o:1, dev:sdc1
> disk 3, o:1, dev:sde1
> disk 4, o:1, dev:sdf1
> 
> ---------------------------------------
> 
> # cat /proc/mdstat
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> [raid4] [multipath] [faulty]
> md1 : active raid5 sdb1[0] sdf1[4] sde1[3] sdd1[5](F) sdc1[1]
>      585924608 blocks level 5, 8192k chunk, algorithm 2 [5/4] [UU_UU]
> 
> md0 : active raid5 sdb2[0] sdf2[4] sde2[3] sdd2[5](F) sdc2[1]
>      390053888 blocks level 5, 1024k chunk, algorithm 2 [5/4] [UU_UU]
> 
> unused devices: <none>
> 
> ---------------------------------------
> 
> And here's what the partitions look like:
> 
> # fdisk  -l /dev/sdb
> 
> Disk /dev/sdb: 249.8 GB, 249865175040 bytes
> 255 heads, 63 sectors/track, 30377 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> 
>   Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1               1       18237   146488671   83  Linux
> /dev/sdb2           18238       30377    97514550   83  Linux
> 
> ---------------------------------------
> Kernel parameters:
> 
> echo 4200000000 > /proc/sys/kernel/shmmax
> echo 4200000000 > /proc/sys/kernel/shmall
> sysctl -w vm.overcommit_memory=2
> 
> echo 8192 >  /sys/block/md0/md/stripe_cache_size
> echo 8192 >  /sys/block/md1/md/stripe_cache_size
> 
> ---------------------------------------
> 
> 
> Both md0 and md1 are used from PostgreSQL - initially it was not design
> to use the whole disk sdb-sdf, but due to size requirement I join also
> the other unused space to be used by PostgreSQL.
> 
> 
> And here's the Postgre's log (FATAL message is coming when I try to
> connect to the database, of course this is the case for the most
> interesting database ... some other small databases are working fine):
> 
> LOG:  received smart shutdown request
> LOG:  autovacuum launcher shutting down
> LOG:  shutting down
> LOG:  database system is shut down
> LOG:  could not create IPv6 socket: Address family not supported by
> protocol
> LOG:  database system was shut down at 2008-05-20 17:54:17 EEST
> LOG:  autovacuum launcher started
> LOG:  database system is ready to accept connections
> FATAL:  "base/16399" is not a valid data directory
> DETAIL:  File "base/16399/PG_VERSION" does not contain valid data.
> HINT:  You might need to initdb.
> 
> Of course base/16399/PG_VERSION contains something strange not the
> version information:
> 
> # cat base/16399/PG_VERSION
> X
> 
> 
> ---------------------------------------
> 
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkg0F0YACgkQjDX6szCBa+r5wwCg5Dzms7G3ipmVaoBbCZd+jPp8
TmIAnRrehvG1m+wvERsZ8J8Xw8v9scO5
=5AgU
-----END PGP SIGNATURE-----