Replicating data files is causing issue with postgres

jlord at mediosystems.com (Jeff Lord) · Mon, 30 Mar 2009 11:40:15 -0700

So we upgraded to the latest GIT realease.
Still seeing the errors from postgres during restore.
Here is our client log.

=
=
=
=
=
=
=
========================================================================
Version      : glusterfs 2.0.0git built on Mar 30 2009 10:40:10
TLA Revision : git://git.sv.gnu.org/gluster.git
Starting Time: 2009-03-30 11:05:09
Command line : /usr/sbin/glusterfs --log-level=WARNING --volfile=/etc/ 
glusterfs/replicatedb.vol --volume-name=cache /mnt/replicate
PID          : 18470
System name  : Linux
Nodename     : gfs01-hq.hq.msrch
Kernel Release : 2.6.18-53.el5PAE
Hardware Identifier: i686

Given volfile:
+ 
------------------------------------------------------------------------------+
   1: volume posix
   2:  type storage/posix
   3:  option directory /mnt/sdb1
   4: end-volume
   5:
   6: volume locks
   7:   type features/locks
   8:   subvolumes posix
   9: end-volume
  10:
  11: volume brick
  12:  type performance/io-threads
  13:  subvolumes locks
  14: end-volume
  15:
  16: volume server
  17:  type protocol/server
  18:  option transport-type tcp
  19:  option auth.addr.brick.allow *
  20:  subvolumes brick
  21: end-volume
  22:
  23: volume gfs01-hq.hq.msrch
  24:  type protocol/client
  25:  option transport-type tcp
  26:  option remote-host gfs01-hq
  27:  option remote-subvolume brick
  28: end-volume
  29:
  30: volume gfs02-hq.hq.msrch
  31:  type protocol/client
  32:  option transport-type tcp
  33:  option remote-host gfs02-hq
  34:  option remote-subvolume brick
  35: end-volume
  36:
  37: volume replicate
  38:  type cluster/replicate
  39:  option favorite-child gfs01-hq.hq.msrch
  40:  subvolumes gfs01-hq.hq.msrch gfs02-hq.hq.msrch
  41: end-volume
  42:
  43: volume writebehind
  44:   type performance/write-behind
  45:   option page-size 128KB
  46:   option cache-size 1MB
  47:   subvolumes replicate
  48: end-volume
  49:
  50: volume cache
  51:   type performance/io-cache
  52:   option cache-size 512MB
  53:   subvolumes writebehind
  54: end-volume
  55:

+ 
------------------------------------------------------------------------------+
2009-03-30 11:05:09 W [afr.c:2118:init] replicate: You have specified  
subvolume 'gfs01-hq.hq.msrch' as the 'favorite child'. This means that  
if a discrepancy in the content or attributes (ownership, permission,  
etc.) of a file is detected among the subvolumes, the file on 'gfs01- 
hq.hq.msrch' will be considered the definitive version and its  
contents will OVERWRITE the contents of the file on other subvolumes.  
All versions of the file except that on 'gfs01-hq.hq.msrch' WILL BE  
LOST.
2009-03-30 11:05:09 W [glusterfsd.c:451:_log_if_option_is_invalid]  
writebehind: option 'page-size' is not recognized
2009-03-30 11:05:09 E [socket.c:729:socket_connect_finish] gfs01- 
hq.hq.msrch: connection failed (Connection refused)
2009-03-30 11:05:09 E [socket.c:729:socket_connect_finish] gfs01- 
hq.hq.msrch: connection failed (Connection refused)
2009-03-30 11:05:09 W [client-protocol.c:6162:client_setvolume_cbk]  
gfs01-hq.hq.msrch: attaching to the local volume 'brick'
2009-03-30 11:05:19 W [client-protocol.c:6162:client_setvolume_cbk]  
gfs01-hq.hq.msrch: attaching to the local volume 'brick'

pg_restore -U entitystore -d entitystore --no-owner -n public  
entitystore
	pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 1829; 0 147089 TABLE  
DATA entity_medio-canon-all-0 entitystore
pg_restore: [archiver (db)] COPY failed: ERROR:  unexpected data  
beyond EOF in block 193028 of relation "entity_medio-canon-all-0"
HINT:  This has been seen to occur with buggy kernels; consider  
updating your system.
CONTEXT:  COPY entity_medio-canon-all-0, line 2566804: "medio-canon- 
all-0	1.mut_113889250837115899	\\340\\000\\000\\001\\0008\\317\\002ns2.http://schemas.me 
..."

pg_restore: [archiver (db)] Error from TOC entry 1834; 0 147124 TABLE  
DATA entity_vzw-wthan-music-2 entitystore
pg_restore: [archiver (db)] COPY failed: ERROR:  unexpected data  
beyond EOF in block 148190 of relation "entity_vzw-wthan-music-2"
HINT:  This has been seen to occur with buggy kernels; consider  
updating your system.
CONTEXT:  COPY entity_vzw-wthan-music-2, line 1366994: "vzw-wthan- 
music-2	11080009	\\340\\000\\000\\001\\0008\\317\\002ns2.http://schemas.medio.com/usearch/ 
..."
WARNING: errors ignored on restore: 2

On Mar 27, 2009, at 9:57 PM, Vikas Gorur wrote:

> 2009/3/28 Jeff Lord <jlord at mediosystems.com>:
>> We are attempting to run a postgres cluster which is composed of  
>> two nodes.
>
>> Each mirroring the data on the other. Gluster config is identical  
>> on each
>> node:
>>
>> The issue seems to be related to using gluster, as when i attempt  
>> the same
>> restore to local (non-replicated disk) it works fine.
>> Is there something amiss in our gluster config? Should we be doing  
>> something
>> different?
>>
>
> What does the client log say?
> Which version are you using? If you're using any version < 2.0.0RC7,
> could you please try with RC7 or later and see if the problem is still
> there?
>
> Vikas
> -- 
> Engineer - Z Research
> http://gluster.com/