----- Original Message ----- > From: "Sachidananda URS" <surs@xxxxxxxxxx> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Friday, December 11, 2015 10:26:04 AM > Subject: Help needed in understanding GlusterFS logs and debugging elasticsearch failures > > Hi, > > I was trying to use GlusterFS as a backend filesystem for storing the > elasticsearch indices on GlusterFS mount. > > The filesystem operations as far as I can understand is, lucene engine > does a lot of renames on the index files. And multiple threads read > from the same file concurrently. > > While writing index, elasticsearch/lucene complains of index corruption and > the > health of the cluster goes to red, and all the operations on the index fail > hereafter. > > =================== > > [2015-12-10 02:43:45,614][WARN ][index.engine ] [client-2] > [logstash-2015.12.09][3] failed engine [merge failed] > org.apache.lucene.index.MergePolicy$MergeException: > org.apache.lucene.index.CorruptIndexException: checksum failed (hardware > problem?) : expected=0 actual=6d811d06 > (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mnt/gluster2/rhs/nodes/0/indices/logstash-2015.12.09/3/index/_a7.cfs") > [slice=_a7_Lucene50_0.doc])) > at > org.elasticsearch.index.engine.InternalEngine$EngineMergeScheduler$1.doRun(InternalEngine.java:1233) > at > org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed > (hardware problem?) : expected=0 actual=6d811d06 > (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mnt/gluster2/rhs/nodes/0/indices/logstash-2015.12.09/3/index/_a7.cfs") > [slice=_a7_Lucene50_0.doc])) > > ===================== > > > Server logs does not have anything. The client logs is full of messages like: > > > > [2015-12-03 18:44:17.882032] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-61881676454442626.tlog > (hash=esearch-replicate-0/cache=esearch-replicate-0) => > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-311.ckp > (hash=esearch-replicate-1/cache=<nul>) > [2015-12-03 18:45:31.276316] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-2384654015514619399.tlog > (hash=esearch-replicate-0/cache=esearch-replicate-0) => > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-312.ckp > (hash=esearch-replicate-0/cache=<nul>) > [2015-12-03 18:45:31.587660] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-4957943728738197940.tlog > (hash=esearch-replicate-0/cache=esearch-replicate-0) => > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-312.ckp > (hash=esearch-replicate-0/cache=<nul>) > [2015-12-03 18:46:48.424605] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-1731620600607498012.tlog > (hash=esearch-replicate-1/cache=esearch-replicate-1) => > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-313.ckp > (hash=esearch-replicate-1/cache=<nul>) > [2015-12-03 18:46:48.466558] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-5214949393126318982.tlog > (hash=esearch-replicate-1/cache=esearch-replicate-1) => > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-313.ckp > (hash=esearch-replicate-1/cache=<nul>) > [2015-12-03 18:48:06.314138] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-9110755229226773921.tlog > (hash=esearch-replicate-0/cache=esearch-replicate-0) => > /rhs/nodes/0/indices/logstash-2015.12.03/4/translog/translog-314.ckp > (hash=esearch-replicate-1/cache=<nul>) > [2015-12-03 18:48:06.332919] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-5193443717817038271.tlog > (hash=esearch-replicate-1/cache=esearch-replicate-1) => > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-314.ckp > (hash=esearch-replicate-1/cache=<nul>) > [2015-12-03 18:49:24.694263] I [MSGID: 109066] [dht-rename.c:1410:dht_rename] > 0-esearch-dht: renaming > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-2750483795035758522.tlog > (hash=esearch-replicate-1/cache=esearch-replicate-1) => > /rhs/nodes/0/indices/logstash-2015.12.03/1/translog/translog-315.ckp > (hash=esearch-replicate-0/cache=<nul>) > > ============================================================== > > The same setup works well on any of the disk filesystems. > This is 2 x 2 distributed-replicate setup: > > # gluster vol info > > Volume Name: esearch > Type: Distributed-Replicate > Volume ID: 4e4b205e-28ed-4f9e-9fa4-0d020428dede > Status: Started > Number of Bricks: 2 x 2 = 4 > Transport-type: tcp,rdma > Bricks: > Brick1: 10.70.47.171:/gluster/brick1 > Brick2: 10.70.47.187:/gluster/brick1 > Brick3: 10.70.47.121:/gluster/brick1 > Brick4: 10.70.47.172:/gluster/brick1 > Options Reconfigured: > performance.read-ahead: off > performance.write-behind: off > > > I need a little bit help in understanding the failures. Let me know if you > need > further information on setup or access to the system to debug further. I've > attached the debug logs for further investigation. > Would it be possible to turn off all the performance translators (md-cache, quickread, io-cache etc.) and check if the same problem persists? Collecting strace of the elasticsearch process that does I/O on gluster can also help. Regards, Vijay _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel