Re: java application crushes while reading a zip file

Dmitry Isakbayev <isakdim@xxxxxxxxx> · Fri, 28 Dec 2018 15:37:39 -0500

These 3 options seem to trigger both (reading zip file and renaming files) problems.

Options Reconfigured:
performance.io-cache: off
performance.stat-prefetch: off
performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: on
performance.write-behind: on
performance.read-ahead: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet

On Fri, Dec 28, 2018 at 10:24 AM Dmitry Isakbayev <isakdim@xxxxxxxxx> wrote:
Turning a single option on at a time still worked fine.  I will keep trying.
We had used 4.1.5 on KVM/CentOS7.5 at AWS without these issues or log messages.  Do you suppose these issues are triggered by the new environment or did not exist in 4.1.5?

[root@node1 ~]# glusterfs --version
glusterfs 4.1.5

On AWS using
[root@node1 ~]# hostnamectl
   Static hostname: node1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: b30d0f2110ac3807b210c19ede3ce88f
           Boot ID: 52bb159a0aa94043a40e7c7651967bd9
    Virtualization: kvm
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
            Kernel: Linux 3.10.0-862.3.2.el7.x86_64
      Architecture: x86-64

On Fri, Dec 28, 2018 at 8:56 AM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:

On Fri, Dec 28, 2018 at 7:23 PM Dmitry Isakbayev <isakdim@xxxxxxxxx> wrote:
Ok. I will try different options.
This system is scheduled to go into production soon.  What version would you recommend to roll back to?

These are long standing issues. So, rolling back may not make these issues go away. Instead if you think performance is agreeable to you, please keep these xlators off in production.

On Thu, Dec 27, 2018 at 10:55 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:

On Fri, Dec 28, 2018 at 3:13 AM Dmitry Isakbayev <isakdim@xxxxxxxxx> wrote:
Raghavendra,

Thank  for the suggestion.  

I am suing 

[root@jl-fanexoss1p glusterfs]# gluster --version
glusterfs 5.0

On 
[root@jl-fanexoss1p glusterfs]# hostnamectl
         Icon name: computer-vm
           Chassis: vm
        Machine ID: e44b8478ef7a467d98363614f4e50535
           Boot ID: eed98992fdda4c88bdd459a89101766b
    Virtualization: vmware
  Operating System: Red Hat Enterprise Linux Server 7.5 (Maipo)
       CPE OS Name: cpe:/o:redhat:enterprise_linux:7.5:GA:server
            Kernel: Linux 3.10.0-862.14.4.el7.x86_64
      Architecture: x86-64

I have configured the following options

[root@jl-fanexoss1p glusterfs]# gluster volume info
Volume Name: gv0
Type: Replicate
Volume ID: 5ffbda09-c5e2-4abc-b89e-79b5d8a40824
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: jl-fanexoss1p.cspire.net:/data/brick1/gv0
Brick2: sl-fanexoss2p.cspire.net:/data/brick1/gv0
Brick3: nxquorum1p.cspire.net:/data/brick1/gv0
Options Reconfigured:
performance.io-cache: off
performance.stat-prefetch: off
performance.quick-read: off
performance.parallel-readdir: off
performance.readdir-ahead: off
performance.write-behind: off
performance.read-ahead: off
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet

I don't know if it is related, but I am seeing a lot of 
[2018-12-27 20:19:23.776080] W [MSGID: 114031] [client-rpc-fops_v2.c:1932:client4_0_seek_cbk] 2-gv0-client-0: remote operation failed [No such device or address]
[2018-12-27 20:19:47.735190] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler

These msgs were introduced by patch [1]. To the best of my knowledge they are benign. We'll be sending a patch to fix these msgs though.

+Mohit Agrawal +Milind Changire . Can you try to identify why we are seeing these messages? If possible please send a patch to fix this.

[1] https://review.gluster.org/r/I578c3fc67713f4234bd3abbec5d3fbba19059ea5

And java.io exceptions trying to rename files.

When you see the errors is it possible to collect,
* strace of the java application (strace -ff -v ...)
* fuse-dump of the glusterfs mount (use option --dump-fuse while mounting)?

I also need another favour from you. By trail and error, can you point out which of the many performance xlators you've turned off is causing the issue?

The above two data-points will help us to fix the problem.

Thank You,
Dmitry

On Thu, Dec 27, 2018 at 3:48 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:
What version of glusterfs are you using? It might be either
* a stale metadata issue. 
* inconsistent ctime issue.

Can you try turning off all performance xlators? If the issue is 1, that should help.

On Fri, Dec 28, 2018 at 1:51 AM Dmitry Isakbayev <isakdim@xxxxxxxxx> wrote:
Attempted to set 'performance.read-ahead off` according to https://jira.apache.org/jira/browse/AMQ-7041That did not help.

On Mon, Dec 24, 2018 at 2:11 PM Dmitry Isakbayev <isakdim@xxxxxxxxx> wrote:
The core file generated by JVM suggests that it happens because the file is changing while it is being read - https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8186557.
The application reads in the zipfile and goes through the zip entries, then reloads the file and goes the zip entries again.  It does so 3 times.  The application never crushes on the 1st cycle but sometimes crushes on the 2nd or 3rd cycle.
The zip file is generated about 20 seconds prior to it being used and is not updated or even used by any other application.  I have never seen this problem on a plain file system.

I would appreciate any suggestions on how to go debugging this issue.  I can change the source code of the java application.

Regards,
Dmitry

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users