Help needed in understanding GlusterFS logs and debugging elasticsearch failures

Sachidananda URS <surs@xxxxxxxxxx> · Fri, 11 Dec 2015 20:56:04 +0530

Hi,

I was trying to use GlusterFS as a backend filesystem for storing the 
elasticsearch indices on GlusterFS mount.

The filesystem operations as far as I can understand is, lucene engine
does a lot of renames on the index files. And multiple threads read
from the same file concurrently. 

While writing index, elasticsearch/lucene complains of index corruption and the
health of the cluster goes to red, and all the operations on the index fail 
hereafter.

===================

[2015-12-10 02:43:45,614][WARN ][index.engine             ] [client-2] [logstash-2015.12.09][3] failed engine [merge failed]
org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=0 actual=6d811d06 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mnt/gluster2/rhs/nodes/0/indices/logstash-2015.12.09/3/index/_a7.cfs") [slice=_a7_Lucene50_0.doc]))
        at org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$1.doRun(InternalEngine.java:1233)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed
(hardware problem?) : expected=0 actual=6d811d06
(resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mnt/gluster2/rhs/nodes/0/indices/logstash-2015.12.09/3/index/_a7.cfs") [slice=_a7_Lucene50_0.doc]))

=====================

Server logs does not have anything. The client logs is full of messages like:

[2015-12-03 18:44:17.882032] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-61881676454442626.tlog (hash=esearch-replicate-0/cache=esearch-replicate-0) => /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-311.ckp (hash=esearch-replicate-1/cache=<nul>)
[2015-12-03 18:45:31.276316] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-2384654015514619399.tlog (hash=esearch-replicate-0/cache=esearch-replicate-0) => /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-312.ckp (hash=esearch-replicate-0/cache=<nul>)
[2015-12-03 18:45:31.587660] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-4957943728738197940.tlog (hash=esearch-replicate-0/cache=esearch-replicate-0) => /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-312.ckp (hash=esearch-replicate-0/cache=<nul>)
[2015-12-03 18:46:48.424605] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-1731620600607498012.tlog (hash=esearch-replicate-1/cache=esearch-replicate-1) => /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-313.ckp (hash=esearch-replicate-1/cache=<nul>)
[2015-12-03 18:46:48.466558] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-5214949393126318982.tlog (hash=esearch-replicate-1/cache=esearch-replicate-1) => /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-313.ckp (hash=esearch-replicate-1/cache=<nul>)
[2015-12-03 18:48:06.314138] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-9110755229226773921.tlog (hash=esearch-replicate-0/cache=esearch-replicate-0) => /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-314.ckp (hash=esearch-replicate-1/cache=<nul>)
[2015-12-03 18:48:06.332919] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-5193443717817038271.tlog (hash=esearch-replicate-1/cache=esearch-replicate-1) => /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-314.ckp (hash=esearch-replicate-1/cache=<nul>)
[2015-12-03 18:49:24.694263] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] 0-esearch-dht: renaming /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-2750483795035758522.tlog (hash=esearch-replicate-1/cache=esearch-replicate-1) => /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-315.ckp (hash=esearch-replicate-0/cache=<nul>)

==============================================================

The same setup works well on any of the disk filesystems.
This is 2 x 2 distributed-replicate setup:

# gluster vol info

Volume Name: esearch
Type: Distributed-Replicate
Volume ID: 4e4b205e-28ed-4f9e-9fa4-0d020428dede
Status: Started
Number of Bricks: 2 x 2 = 4
Transport-type: tcp,rdma
Bricks:
Brick1: 10.70.47.171:/gluster/brick1
Brick2: 10.70.47.187:/gluster/brick1
Brick3: 10.70.47.121:/gluster/brick1
Brick4: 10.70.47.172:/gluster/brick1
Options Reconfigured:
performance.read-ahead: off
performance.write-behind: off

I need a little bit help in understanding the failures. Let me know if you need
further information on setup or access to the system to debug further. I've
attached the debug logs for further investigation.

-sac

Attachment:
mnt-gluster.log.bz2

Description: BZip2 compressed data
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel