Ok. Problem solved. We were mounting the file system with: mount -t glusterfs -o volume-name=cache /etc/glusterfs/ replicatedb.vol /mnt/replicate So I dropped the db and the tablespace and remounted the gluster share as: mount -t glusterfs -o volume-name=replicate /etc/glusterfs/ replicatedb.vol /mnt/replicate After that our full database restore completed with no errors. This is a great thing! As you can see the volume-name=cache references write-behind, which seemed to be causing the problems. > 43: volume writebehind > 44: type performance/write-behind > 45: option page-size 128KB > 46: option cache-size 1MB > 47: subvolumes replicate > 48: end-volume > 50: volume cache > 51: type performance/io-cache > 52: option cache-size 512MB > 53: subvolumes writebehind > 54: end-volume Does anybody on the list have any experience or references to postgres cluster using glusterfs? Presumably we will be configuring some type of active-standby with two hosts that share a file system, failover handled by linux-ha/heartbeat. Cheers, Jeff On Mar 30, 2009, at 11:40 AM, Jeff Lord wrote: > So we upgraded to the latest GIT realease. > Still seeing the errors from postgres during restore. > Here is our client log. > > > = > = > = > = > = > = > = > = > = > ====================================================================== > Version : glusterfs 2.0.0git built on Mar 30 2009 10:40:10 > TLA Revision : git://git.sv.gnu.org/gluster.git > Starting Time: 2009-03-30 11:05:09 > Command line : /usr/sbin/glusterfs --log-level=WARNING --volfile=/ > etc/glusterfs/replicatedb.vol --volume-name=cache /mnt/replicate > PID : 18470 > System name : Linux > Nodename : gfs01-hq.hq.msrch > Kernel Release : 2.6.18-53.el5PAE > Hardware Identifier: i686 > > Given volfile: > + > ------------------------------------------------------------------------------+ > 1: volume posix > 2: type storage/posix > 3: option directory /mnt/sdb1 > 4: end-volume > 5: > 6: volume locks > 7: type features/locks > 8: subvolumes posix > 9: end-volume > 10: > 11: volume brick > 12: type performance/io-threads > 13: subvolumes locks > 14: end-volume > 15: > 16: volume server > 17: type protocol/server > 18: option transport-type tcp > 19: option auth.addr.brick.allow * > 20: subvolumes brick > 21: end-volume > 22: > 23: volume gfs01-hq.hq.msrch > 24: type protocol/client > 25: option transport-type tcp > 26: option remote-host gfs01-hq > 27: option remote-subvolume brick > 28: end-volume > 29: > 30: volume gfs02-hq.hq.msrch > 31: type protocol/client > 32: option transport-type tcp > 33: option remote-host gfs02-hq > 34: option remote-subvolume brick > 35: end-volume > 36: > 37: volume replicate > 38: type cluster/replicate > 39: option favorite-child gfs01-hq.hq.msrch > 40: subvolumes gfs01-hq.hq.msrch gfs02-hq.hq.msrch > 41: end-volume > 42: > 43: volume writebehind > 44: type performance/write-behind > 45: option page-size 128KB > 46: option cache-size 1MB > 47: subvolumes replicate > 48: end-volume > 49: > 50: volume cache > 51: type performance/io-cache > 52: option cache-size 512MB > 53: subvolumes writebehind > 54: end-volume > 55: > > + > ------------------------------------------------------------------------------+ > 2009-03-30 11:05:09 W [afr.c:2118:init] replicate: You have > specified subvolume 'gfs01-hq.hq.msrch' as the 'favorite child'. > This means that if a discrepancy in the content or attributes > (ownership, permission, etc.) of a file is detected among the > subvolumes, the file on 'gfs01-hq.hq.msrch' will be considered the > definitive version and its contents will OVERWRITE the contents of > the file on other subvolumes. All versions of the file except that > on 'gfs01-hq.hq.msrch' WILL BE LOST. > 2009-03-30 11:05:09 W [glusterfsd.c:451:_log_if_option_is_invalid] > writebehind: option 'page-size' is not recognized > 2009-03-30 11:05:09 E [socket.c:729:socket_connect_finish] gfs01- > hq.hq.msrch: connection failed (Connection refused) > 2009-03-30 11:05:09 E [socket.c:729:socket_connect_finish] gfs01- > hq.hq.msrch: connection failed (Connection refused) > 2009-03-30 11:05:09 W [client-protocol.c:6162:client_setvolume_cbk] > gfs01-hq.hq.msrch: attaching to the local volume 'brick' > 2009-03-30 11:05:19 W [client-protocol.c:6162:client_setvolume_cbk] > gfs01-hq.hq.msrch: attaching to the local volume 'brick' > > > pg_restore -U entitystore -d entitystore --no-owner -n public > entitystore > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 1829; 0 147089 > TABLE DATA entity_medio-canon-all-0 entitystore > pg_restore: [archiver (db)] COPY failed: ERROR: unexpected data > beyond EOF in block 193028 of relation "entity_medio-canon-all-0" > HINT: This has been seen to occur with buggy kernels; consider > updating your system. > CONTEXT: COPY entity_medio-canon-all-0, line 2566804: "medio-canon- > all-0 1.mut_113889250837115899 \\340\\000\\000\\001\\0008\\317\ > \002ns2.http://schemas.me..." > > pg_restore: [archiver (db)] Error from TOC entry 1834; 0 147124 > TABLE DATA entity_vzw-wthan-music-2 entitystore > pg_restore: [archiver (db)] COPY failed: ERROR: unexpected data > beyond EOF in block 148190 of relation "entity_vzw-wthan-music-2" > HINT: This has been seen to occur with buggy kernels; consider > updating your system. > CONTEXT: COPY entity_vzw-wthan-music-2, line 1366994: "vzw-wthan- > music-2 11080009 \\340\\000\\000\\001\\0008\\317\\002ns2.http://schemas.medio.com/usearch/ > ..." > WARNING: errors ignored on restore: 2 > > On Mar 27, 2009, at 9:57 PM, Vikas Gorur wrote: > >> 2009/3/28 Jeff Lord <jlord at mediosystems.com>: >>> We are attempting to run a postgres cluster which is composed of >>> two nodes. >> >>> Each mirroring the data on the other. Gluster config is identical >>> on each >>> node: >>> >>> The issue seems to be related to using gluster, as when i attempt >>> the same >>> restore to local (non-replicated disk) it works fine. >>> Is there something amiss in our gluster config? Should we be doing >>> something >>> different? >>> >> >> What does the client log say? >> Which version are you using? If you're using any version < 2.0.0RC7, >> could you please try with RC7 or later and see if the problem is >> still >> there? >> >> Vikas >> -- >> Engineer - Z Research >> http://gluster.com/ > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users