Buggy writebehind translators !!!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Finally , I succeeded in compiling the last patched version of glusterfs.

I succeeded in configuring and compiling the latest patched sources fetched with tla thought I had a problem with an older automake on my servers, the same archive has been autogen.sh-ed and configured just fine on the client computer with CentOS 5.0. So, I picked up the whole tree and ran ./configure on the final destination computer server, everything went OK.

Started the servers (3) , mounted locally the client, tried again the PostgreSQL database on the mounted disk.

No more "Transport endpoint is not connected" errors .. BUT the database cannot complete a simple import into a table complaining about some "unexpected data beyond EOF in a file block"

COPY animal (id, stare_animal_fk, rasa_fk, sex, data_inregistrare, data_nasterii, prima_exploatatie_fk, primul_proprietar_fk, cod_anarz, data_trecere_granita, tara_origine_fk, cod_crotalie_non_eu, crotalie_mama, observatii, data_upload, versiune) FROM stdin;
ERROR:  unexpected data beyond EOF in block 2480 of relation "animal"
HINT: This has been seen to occur with buggy kernels; consider updating your system.

I tried 5 times, exactly the same error ! I suspect some data corruption when assembling data file blocks.

The client volume is configured with AFR , READAHEAD and WRITEBEHIND translators like this :

volume afr
 type cluster/afr
 subvolumes client1 client2 client3
 option replicate *:3
end-volume

volume writebehind
  type performance/write-behind
  option aggregate-size 131072 # aggregate block size in bytes
  subvolumes afr
end-volume

volume readahead
  type performance/read-ahead
  option page-size 131072 ### size in bytes
option page-count 16 ### page-size x page-count is the amount of read-ahead data per file
  subvolumes writebehind
end-volume

I suspected that readahead & writebehind might have some problems so I commented them, leaving the afr volume alone, non-optimized. The operations were obviously more slower but everything went OK, I tried multiple reads and updates, everything is OK now.

Then I tried to test just the writebehind translator in order to point exactly to the buggy code. Introduced again just the writebehind translator , everything seems to work, imported the table, done 9 full updates on over 700.000 records, when tried to vacuum the table ... dang, another error :

glu=# update animal set observatii='ok1';
UPDATE 713268
glu=# update animal set observatii='ok2';
UPDATE 713268
.............
glu=# update animal set observatii='ok8';
UPDATE 713268
glu=# update animal set observatii='ok9';
UPDATE 713268
glu=# vacuum full analyze;
ERROR: could not read block 69998 of relation 531069804/531069805/531069806: File descriptor in bad state
glu=# vacuum full analyze;
ERROR: could not read block 69998 of relation 531069804/531069805/531069806: File descriptor in bad state

dropped the database, the files, cleaned everything, start it over again with a fresh and empty volumes, created the database again, imported the table, OK, updated 1 time, vacuum -> ERROR
glu=# update animal set observatii='ok1';
UPDATE 713268
glu=# vacuum full analyze;
ERROR: could not read block 478 of relation 531783093/531783094/531783095: File descriptor in bad state

Removed the wribehind translator, activate the readahed, done the same tests all and over again -> EVERYTHING IS OK !
So , the write-behind translator should be revised.
How can I help you in order to pinpoint the bug?

--
Constantin Teodorescu
Braila




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux