Finally , I succeeded in compiling the last patched version of glusterfs.
I succeeded in configuring and compiling the latest patched sources
fetched with tla thought I had a problem with an older automake on my
servers, the same archive has been autogen.sh-ed and configured just
fine on the client computer with CentOS 5.0. So, I picked up the whole
tree and ran ./configure on the final destination computer server,
everything went OK.
Started the servers (3) , mounted locally the client, tried again the
PostgreSQL database on the mounted disk.
No more "Transport endpoint is not connected" errors .. BUT the database
cannot complete a simple import into a table complaining about some
"unexpected data beyond EOF in a file block"
COPY animal (id, stare_animal_fk, rasa_fk, sex, data_inregistrare,
data_nasterii, prima_exploatatie_fk, primul_proprietar_fk, cod_anarz,
data_trecere_granita, tara_origine_fk, cod_crotalie_non_eu,
crotalie_mama, observatii, data_upload, versiune) FROM stdin;
ERROR: unexpected data beyond EOF in block 2480 of relation "animal"
HINT: This has been seen to occur with buggy kernels; consider updating
your system.
I tried 5 times, exactly the same error ! I suspect some data corruption
when assembling data file blocks.
The client volume is configured with AFR , READAHEAD and WRITEBEHIND
translators like this :
volume afr
type cluster/afr
subvolumes client1 client2 client3
option replicate *:3
end-volume
volume writebehind
type performance/write-behind
option aggregate-size 131072 # aggregate block size in bytes
subvolumes afr
end-volume
volume readahead
type performance/read-ahead
option page-size 131072 ### size in bytes
option page-count 16 ### page-size x page-count is the amount of
read-ahead data per file
subvolumes writebehind
end-volume
I suspected that readahead & writebehind might have some problems so I
commented them, leaving the afr volume alone, non-optimized.
The operations were obviously more slower but everything went OK, I
tried multiple reads and updates, everything is OK now.
Then I tried to test just the writebehind translator in order to point
exactly to the buggy code.
Introduced again just the writebehind translator , everything seems to
work, imported the table, done 9 full updates on over 700.000 records,
when tried to vacuum the table ... dang, another error :
glu=# update animal set observatii='ok1';
UPDATE 713268
glu=# update animal set observatii='ok2';
UPDATE 713268
.............
glu=# update animal set observatii='ok8';
UPDATE 713268
glu=# update animal set observatii='ok9';
UPDATE 713268
glu=# vacuum full analyze;
ERROR: could not read block 69998 of relation
531069804/531069805/531069806: File descriptor in bad state
glu=# vacuum full analyze;
ERROR: could not read block 69998 of relation
531069804/531069805/531069806: File descriptor in bad state
dropped the database, the files, cleaned everything, start it over again
with a fresh and empty volumes, created the database again, imported the
table, OK, updated 1 time, vacuum -> ERROR
glu=# update animal set observatii='ok1';
UPDATE 713268
glu=# vacuum full analyze;
ERROR: could not read block 478 of relation
531783093/531783094/531783095: File descriptor in bad state
Removed the wribehind translator, activate the readahed, done the same
tests all and over again -> EVERYTHING IS OK !
So , the write-behind translator should be revised.
How can I help you in order to pinpoint the bug?
--
Constantin Teodorescu
Braila