Index error on recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

The details given below is for a test server, that I brought back up from the base (after running a day worth wals and by giving recovery_target_time before starting the recovery). The recovery went fine and the DB came online as expected. But I see that although trivial select queries seem to work well (while using indexes of the table concerned) it throws up errors which indicate that the index is corrupted when we try to run a query that requires access to all rows of the table.

My question is that if this is index corruption (seems like it) where did we go wrong with the recovery (the live server is still working fine without any such error in the live-server-logs)? Do we need to reindex after such a recovery (I thought that was required only when we do something like pgresetxlog)? If this isn't a fault with recovery, does it mean that the primary DB (which is currently active) need strict checking? Any way we could do that, without bringing down the DB (i.e. anything besides a PGDump)?

Thanks
Manoj K P


============================
   
PG Version:
==========
mydb=# SELECT  version () ;
                                                    version                                                      
------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.9 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50), 64-bit



Successful WAL Recovery:
=======================

Dec  6 14:25:47 database_host_name postgres[19329]: [3150-1] 2011-12-06 14:25:47.207 GMT 19329 LOG:  restored log file "000000010000011000000013" from archive
Dec  6 14:25:48 database_host_name postgres[19329]: [3151-1] 2011-12-06 14:25:48.540 GMT 19329 LOG:  restored log file "000000010000011000000014" from archive
Dec  6 14:25:49 database_host_name postgres[19329]: [3152-1] 2011-12-06 14:25:49.833 GMT 19329 LOG:  restored log file "000000010000011000000015" from archive
Dec  6 14:25:51 database_host_name postgres[19329]: [3153-1] 2011-12-06 14:25:51.755 GMT 19329 LOG:  restored log file "000000010000011000000016" from archive
Dec  6 14:25:53 database_host_name postgres[19329]: [3154-1] 2011-12-06 14:25:53.155 GMT 19329 LOG:  restored log file "000000010000011000000017" from archive
Dec  6 14:25:54 database_host_name postgres[19329]: [3155-1] 2011-12-06 14:25:54.462 GMT 19329 LOG:  restored log file "000000010000011000000018" from archive
Dec  6 14:25:54 database_host_name postgres[19329]: [3156-1] 2011-12-06 14:25:54.570 GMT 19329 LOG:  recovery stopping before commit of transaction 19723746, time 2011-11-24 12:39:00.005756+00
Dec  6 14:25:54 database_host_name postgres[19329]: [3157-1] 2011-12-06 14:25:54.570 GMT 19329 LOG:  redo done at 110/18F069E8
Dec  6 14:25:54 database_host_name postgres[19329]: [3158-1] 2011-12-06 14:25:54.570 GMT 19329 LOG:  last completed transaction was at log time 2011-11-24 12:38:59.966238+00
Dec  6 14:25:54 database_host_name postgres[19329]: [3159-1] 2011-12-06 14:25:54.597 GMT 19329 LOG:  selected new timeline ID: 2
Dec  6 14:25:54 database_host_name postgres[19329]: [3160-1] 2011-12-06 14:25:54.623 GMT 19329 LOG:  archive recovery complete
Dec  6 14:25:55 database_host_name postgres[19382]: [20-1] 2011-12-06 14:25:55.845 GMT 19382 LOG:  restartpoint complete: wrote 396106 buffers (37.8%); write=318.653 s, sync=0.000 s, total=318.654 s
Dec  6 14:25:55 database_host_name postgres[19382]: [21-1] 2011-12-06 14:25:55.845 GMT 19382 LOG:  recovery restart point at 10E/FAE0DFA8
Dec  6 14:25:55 database_host_name postgres[19382]: [22-1] 2011-12-06 14:25:55.848 GMT 19382 LOG:  checkpoint starting: end-of-recovery immediate wait
Dec  6 14:26:01 database_host_name postgres[19382]: [23-1] 2011-12-06 14:26:01.484 GMT 19382 LOG:  checkpoint complete: wrote 376660 buffers (35.9%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=5.635 s, sync=0.000 s, total=5.636 s
Dec  6 14:26:04 database_host_name postgres[19328]: [1-1] 2011-12-06 14:26:04.340 GMT 19328 LOG:  database system is ready to accept connections


Basic Querying uses Index and works fine:
========================================

mydb =# EXPLAIN  SELECT  * from tb1_1  order by "ObjId" desc limit 10 ;

                                                             QUERY PLAN                                                      
       
------------------------------------------------------------------------------------------------------------------------------
--------
Limit  (cost=0.00..7.41 rows=10 width=28)
  ->  Index Scan Backward using "IND_tbl_1_ObjId" on tbl_1  (cost=0.00..1174428208.24 rows=1584137511 width=28)



mydb=# SELECT  * from users_activity  order by "ObjId" desc limit 10 ;
ObjId      |         date                    | Usr    | Obj_typ      |    Id     | Act_id   | new_id
------------+---------------------------------+--------+--------------+-----------+----------+----------------
1907972745 | Thu 24 Nov 07:38:59.013068 2011 |    272 |            2 | 479937676 |        1 |              
1907972743 | Thu 24 Nov 07:38:59.829819 2011 |    272 |            2 | 350683600 |        1 |              
1907972742 | Thu 24 Nov 07:38:59.679056 2011 |      0 |            2 | 479937674 |       11 |              
1907972741 | Thu 24 Nov 07:38:59.550655 2011 |    272 |            2 | 423704886 |        1 |              
1907972740 | Thu 24 Nov 07:38:59.355022 2011 |    239 |            2 | 441683722 |        1 |              
1907972739 | Thu 24 Nov 07:38:59.40387 2011  |    272 |            2 |   1061714 |        1 |              
1907972738 | Thu 24 Nov 07:38:59.262335 2011 |    272 |            2 | 350683588 |        1 |              
1907972737 | Thu 24 Nov 07:38:59.146628 2011 |    272 |            2 | 340702121 |        1 |              
1907972736 | Thu 24 Nov 07:38:59.102374 2011 |    239 |            2 | 467763707 |        1 |              
1907972735 | Thu 24 Nov 07:38:59.06309 2011  |    272 |            2 | 423704873 |        1 |              
(10 rows)




Accessing Each Row:
==================

mydb=# ALTER TABLE tbl_1 ADD new_id bigint ;
mydb=# EXPLAIN UPDATE tbl_1 SET new_id=id ;
                                 QUERY PLAN                                  
-------------------------------------------------------------------------------
Seq Scan on tbl_1  (cost=0.00..30853774.16 rows=1554095373 width=34)
(1 row)


mydb=# UPDATE tbl_1 SET new_id=id ;

ERROR:  right sibling's left-link doesn't match: block 41923 links to 4268638 instead of expected 41922 in index "IND_tbl_1_ObjId"


Further, tried to work around the Index error:
=============================================
mydb=# DROP INDEX "IND_tb_1_ObjId" ;
mydb=# UPDATE tbl_1 SET new_id=id ;

ERROR:  right sibling's left-link doesn't match: block 52886 links to 4266706 instead of expected 52885 in index "IND_tbl_1_Act_ObjId"


But now we have an error with the next index on the same table.
==============================================================

Other notes:
1. Primary server was running when the base backup was taken.
2. Base backup was taken with start / stop and recovery time was 1-2 days after that, so enough logs were played on the standby server.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux