Replicating data files is causing issue with postgres

jlord at mediosystems.com (Jeff Lord) · Tue, 31 Mar 2009 11:16:34 -0700

Ok. Problem solved.
We were mounting the file system with:

mount -t glusterfs -o volume-name=cache /etc/glusterfs/ 
replicatedb.vol /mnt/replicate

So I dropped the db and the tablespace and remounted the gluster share  
as:

mount -t glusterfs -o volume-name=replicate /etc/glusterfs/ 
replicatedb.vol /mnt/replicate

After that our full database restore completed with no errors.
This is a great thing!
As you can see the volume-name=cache references write-behind, which  
seemed to be causing the problems.

> 43: volume writebehind
> 44:   type performance/write-behind
> 45:   option page-size 128KB
> 46:   option cache-size 1MB
> 47:   subvolumes replicate
> 48: end-volume

> 50: volume cache
> 51:   type performance/io-cache
> 52:   option cache-size 512MB
> 53:   subvolumes writebehind
> 54: end-volume

Does anybody on the list have any experience or references to postgres  
cluster using glusterfs?
Presumably we will be configuring some type of active-standby with two  
hosts that share a file system, failover handled by linux-ha/heartbeat.

Cheers,

Jeff

On Mar 30, 2009, at 11:40 AM, Jeff Lord wrote:

> So we upgraded to the latest GIT realease.
> Still seeing the errors from postgres during restore.
> Here is our client log.
>
>
> =
> =
> =
> =
> =
> =
> =
> = 
> = 
> ======================================================================
> Version      : glusterfs 2.0.0git built on Mar 30 2009 10:40:10
> TLA Revision : git://git.sv.gnu.org/gluster.git
> Starting Time: 2009-03-30 11:05:09
> Command line : /usr/sbin/glusterfs --log-level=WARNING --volfile=/ 
> etc/glusterfs/replicatedb.vol --volume-name=cache /mnt/replicate
> PID          : 18470
> System name  : Linux
> Nodename     : gfs01-hq.hq.msrch
> Kernel Release : 2.6.18-53.el5PAE
> Hardware Identifier: i686
>
> Given volfile:
> + 
> ------------------------------------------------------------------------------+
>  1: volume posix
>  2:  type storage/posix
>  3:  option directory /mnt/sdb1
>  4: end-volume
>  5:
>  6: volume locks
>  7:   type features/locks
>  8:   subvolumes posix
>  9: end-volume
> 10:
> 11: volume brick
> 12:  type performance/io-threads
> 13:  subvolumes locks
> 14: end-volume
> 15:
> 16: volume server
> 17:  type protocol/server
> 18:  option transport-type tcp
> 19:  option auth.addr.brick.allow *
> 20:  subvolumes brick
> 21: end-volume
> 22:
> 23: volume gfs01-hq.hq.msrch
> 24:  type protocol/client
> 25:  option transport-type tcp
> 26:  option remote-host gfs01-hq
> 27:  option remote-subvolume brick
> 28: end-volume
> 29:
> 30: volume gfs02-hq.hq.msrch
> 31:  type protocol/client
> 32:  option transport-type tcp
> 33:  option remote-host gfs02-hq
> 34:  option remote-subvolume brick
> 35: end-volume
> 36:
> 37: volume replicate
> 38:  type cluster/replicate
> 39:  option favorite-child gfs01-hq.hq.msrch
> 40:  subvolumes gfs01-hq.hq.msrch gfs02-hq.hq.msrch
> 41: end-volume
> 42:
> 43: volume writebehind
> 44:   type performance/write-behind
> 45:   option page-size 128KB
> 46:   option cache-size 1MB
> 47:   subvolumes replicate
> 48: end-volume
> 49:
> 50: volume cache
> 51:   type performance/io-cache
> 52:   option cache-size 512MB
> 53:   subvolumes writebehind
> 54: end-volume
> 55:
>
> + 
> ------------------------------------------------------------------------------+
> 2009-03-30 11:05:09 W [afr.c:2118:init] replicate: You have  
> specified subvolume 'gfs01-hq.hq.msrch' as the 'favorite child'.  
> This means that if a discrepancy in the content or attributes  
> (ownership, permission, etc.) of a file is detected among the  
> subvolumes, the file on 'gfs01-hq.hq.msrch' will be considered the  
> definitive version and its contents will OVERWRITE the contents of  
> the file on other subvolumes. All versions of the file except that  
> on 'gfs01-hq.hq.msrch' WILL BE LOST.
> 2009-03-30 11:05:09 W [glusterfsd.c:451:_log_if_option_is_invalid]  
> writebehind: option 'page-size' is not recognized
> 2009-03-30 11:05:09 E [socket.c:729:socket_connect_finish] gfs01- 
> hq.hq.msrch: connection failed (Connection refused)
> 2009-03-30 11:05:09 E [socket.c:729:socket_connect_finish] gfs01- 
> hq.hq.msrch: connection failed (Connection refused)
> 2009-03-30 11:05:09 W [client-protocol.c:6162:client_setvolume_cbk]  
> gfs01-hq.hq.msrch: attaching to the local volume 'brick'
> 2009-03-30 11:05:19 W [client-protocol.c:6162:client_setvolume_cbk]  
> gfs01-hq.hq.msrch: attaching to the local volume 'brick'
>
>
> pg_restore -U entitystore -d entitystore --no-owner -n public  
> entitystore
> 	pg_restore: [archiver (db)] Error while PROCESSING TOC:
> pg_restore: [archiver (db)] Error from TOC entry 1829; 0 147089  
> TABLE DATA entity_medio-canon-all-0 entitystore
> pg_restore: [archiver (db)] COPY failed: ERROR:  unexpected data  
> beyond EOF in block 193028 of relation "entity_medio-canon-all-0"
> HINT:  This has been seen to occur with buggy kernels; consider  
> updating your system.
> CONTEXT:  COPY entity_medio-canon-all-0, line 2566804: "medio-canon- 
> all-0	1.mut_113889250837115899	\\340\\000\\000\\001\\0008\\317\ 
> \002ns2.http://schemas.me...";
>
> pg_restore: [archiver (db)] Error from TOC entry 1834; 0 147124  
> TABLE DATA entity_vzw-wthan-music-2 entitystore
> pg_restore: [archiver (db)] COPY failed: ERROR:  unexpected data  
> beyond EOF in block 148190 of relation "entity_vzw-wthan-music-2"
> HINT:  This has been seen to occur with buggy kernels; consider  
> updating your system.
> CONTEXT:  COPY entity_vzw-wthan-music-2, line 1366994: "vzw-wthan- 
> music-2	11080009	\\340\\000\\000\\001\\0008\\317\\002ns2.http://schemas.medio.com/usearch/ 
> ..."
> WARNING: errors ignored on restore: 2
>
> On Mar 27, 2009, at 9:57 PM, Vikas Gorur wrote:
>
>> 2009/3/28 Jeff Lord <jlord at mediosystems.com>:
>>> We are attempting to run a postgres cluster which is composed of  
>>> two nodes.
>>
>>> Each mirroring the data on the other. Gluster config is identical  
>>> on each
>>> node:
>>>
>>> The issue seems to be related to using gluster, as when i attempt  
>>> the same
>>> restore to local (non-replicated disk) it works fine.
>>> Is there something amiss in our gluster config? Should we be doing  
>>> something
>>> different?
>>>
>>
>> What does the client log say?
>> Which version are you using? If you're using any version < 2.0.0RC7,
>> could you please try with RC7 or later and see if the problem is  
>> still
>> there?
>>
>> Vikas
>> -- 
>> Engineer - Z Research
>> http://gluster.com/
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users