Re: Corrupt indexes on slave when using pg_bulkload on master

James Cowell <jcowell@xxxxxxxxxxxxxx> · Mon, 10 Dec 2012 14:08:14 +0000 (GMT)

Hello,

Does anyone else have any insight or information around this issue?  I can't find anything out there on the web.

If it's a simple incompatability then fine, but I'd still like to understand why the indexes on the secondary node corrupt when the data seems to be replicated without issue.

Thanks,

James

  From: James Cowell <jcowell@xxxxxxxxxxxxxx>
 To: 'Jeff Janes' <jeff.janes@xxxxxxxxx> 
Cc: pgsql-general@xxxxxxxxxxxxxx 
 Sent: Wednesday, 5 December 2012, 18:56
 Subject: Re:  Corrupt indexes on slave when using pg_bulkload on master

Hi Jeff,

Thanks for your reply.

I can
 understand that it would be incompatible if you bypassed the WAL logs,
but I've got it configured to use the buffered loader so that WAL logs are
written.  The data replicates fine, the indexes don't, yet a reindex on the
primary fixes the index on the secondary.  I'd like to understand why if
possible.

I've seen mention of bug(s) around corrupt indexes on the secondary node,
there was a fix in 9.1.6, but it doesn't appear to be the same problem I'm
having. 

I've also seen other people with similar errors that aren't using
pg_bulkload, for example:
http://postgresql.1045698.n5.nabble.com/BUG-7562-could-not-read-block-0-in-f
ile-quot-base-16385-16585-quot-read-only-0-of-8192-bytes-td5724738.html

Cheers,

James

-----Original Message-----
From: Jeff Janes [mailto:jeff.janes@xxxxxxxxx] 
Sent: 05 December 2012 18:02
To: James Cowell
Cc: pgsql-general@xxxxxxxxxxxxxx
Subject: Re:  Corrupt indexes on slave when using pg_bulkload on
master

On Wed, Dec 5, 2012 at 5:17 AM, James Cowell <jcowell@xxxxxxxxxxxxxx> wrote:
> I'm using pg_bulkload to load large amounts of CSV data into a 
> postgres database hourly.
>
> This database is replicated to a second node.

pg_bulkload is fundamentally incompatible with PITR, streaming, and forms of
replication that depend on those things.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

I'm using pg_bulkload to load large amounts of CSV data into a postgres database hourly.

This database is replicated to a second node.

Whenever a bulk load happens the indexes on the updated tables on the secondary node corrupt and are unusable until a reindex is run on the primary node.  I get the error below on node 2:

ERROR: index "tablename" contains unexpected zero page at block 0
SQL state: XX002
Hint: Please REINDEX it.

I'm assuming that this is because of the way pg_bulkload builds the index on the primary, and possibly has something to do with the way pg_bulkload overwrites rows in the event of a constraint violation, but at the same time if something works on the primary shouldn't the replicated node be able to process the WAL log?

I've tried this on 9.1.6 and 9.1.5 on RHEL 6.3 with pg_bulkload build 3.1.1-1.pg91.rhel6 and it happens every time.

Does anyone have any experience in this area or advice they could give?  If you can point out something stupid I'm doing that
 would be very welcome :)

Thanks,

James