Re: Client un-mounting since upgrade to 3.12.9-1 version

Jim Kinney <jim.kinney@xxxxxxxxx> · Thu, 14 Jun 2018 10:30:51 -0400

Hmm. I have a 3.12.9 volume (several) with 3.12.9 clients that are dropping the mount yet parallel.readdir is off. This is only happening on the RDMA interface. The TCP transport mounts are fine.

Option                                  Value                                   
------                                  -----                                   
cluster.lookup-unhashed                 on                                      
cluster.lookup-optimize                 off                                     
cluster.min-free-disk                   10%                                     
cluster.min-free-inodes                 5%                                      
cluster.rebalance-stats                 off                                     
cluster.subvols-per-directory           (null)                                  
cluster.readdir-optimize                off                                     
cluster.rsync-hash-regex                (null)                                  
cluster.extra-hash-regex                (null)                                  
cluster.dht-xattr-name                  trusted.glusterfs.dht                   
cluster.randomize-hash-range-by-gfid    off                                     
cluster.rebal-throttle                  normal                                  
cluster.lock-migration                  off                                     
cluster.local-volume-name               (null)                                  
cluster.weighted-rebalance              on                                      
cluster.switch-pattern                  (null)                                  
cluster.entry-change-log                on                                      
cluster.read-subvolume                  (null)                                  
cluster.read-subvolume-index            -1                                      
cluster.read-hash-mode                  1                                       
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                enable                                  
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-change-log                 on                                      
cluster.metadata-change-log             on                                      
cluster.data-self-heal-algorithm        (null)                                  
cluster.eager-lock                      on                                      
disperse.eager-lock                     on                                      
cluster.quorum-type                     none                                    
cluster.quorum-count                    (null)                                  
cluster.choose-local                    true                                    
cluster.self-heal-readdir-size          1KB                                     
cluster.post-op-delay-secs              1                                       
cluster.ensure-durability               on                                      
cluster.consistent-metadata             no                                      
cluster.heal-wait-queue-length          128                                     
cluster.favorite-child-policy           none                                    
cluster.stripe-block-size               128KB                                   
cluster.stripe-coalesce                 true                                    
diagnostics.latency-measurement         off                                     
diagnostics.dump-fd-stats               off                                     
diagnostics.count-fop-hits              off                                     
diagnostics.brick-log-level             INFO                                    
diagnostics.client-log-level            INFO                                    
diagnostics.brick-sys-log-level         CRITICAL                                
diagnostics.client-sys-log-level        CRITICAL                                
diagnostics.brick-logger                (null)                                  
diagnostics.client-logger               (null)                                  
diagnostics.brick-log-format            (null)                                  
diagnostics.client-log-format           (null)                                  
diagnostics.brick-log-buf-size          5                                       
diagnostics.client-log-buf-size         5                                       
diagnostics.brick-log-flush-timeout     120                                     
diagnostics.client-log-flush-timeout    120                                     
diagnostics.stats-dump-interval         0                                       
diagnostics.fop-sample-interval         0                                       
diagnostics.stats-dump-format           json                                    
diagnostics.fop-sample-buf-size         65535                                   
diagnostics.stats-dnscache-ttl-sec      86400                                   
performance.cache-max-file-size         0                                       
performance.cache-min-file-size         0                                       
performance.cache-refresh-timeout       1                                       
performance.cache-priority                                                      
performance.cache-size                  32MB                                    
performance.io-thread-count             16                                      
performance.high-prio-threads           16                                      
performance.normal-prio-threads         16                                      
performance.low-prio-threads            16                                      
performance.least-prio-threads          1                                       
performance.enable-least-priority       on                                      
performance.cache-size                  128MB                                   
performance.flush-behind                on                                      
performance.nfs.flush-behind            on                                      
performance.write-behind-window-size    1MB                                     
performance.resync-failed-syncs-after-fsyncoff                                     
performance.nfs.write-behind-window-size1MB                                     
performance.strict-o-direct             off                                     
performance.nfs.strict-o-direct         off                                     
performance.strict-write-ordering       off                                     
performance.nfs.strict-write-ordering   off                                     
performance.lazy-open                   yes                                     
performance.read-after-open             no                                      
performance.read-ahead-page-count       4                                       
performance.md-cache-timeout            1                                       
performance.cache-swift-metadata        true                                    
performance.cache-samba-metadata        false                                   
performance.cache-capability-xattrs     true                                    
performance.cache-ima-xattrs            true                                    
features.encryption                     off                                     
encryption.master-key                   (null)                                  
encryption.data-key-size                256                                     
encryption.block-size                   4096                                    
network.frame-timeout                   1800                                    
network.ping-timeout                    42                                      
network.tcp-window-size                 (null)                                  
features.lock-heal                      off                                     
features.grace-timeout                  10                                      
network.remote-dio                      disable                                 
client.event-threads                    2                                       
client.tcp-user-timeout                 0                                       
client.keepalive-time                   20                                      
client.keepalive-interval               2                                       
client.keepalive-count                  9                                       
network.tcp-window-size                 (null)                                  
network.inode-lru-limit                 16384                                   
auth.allow                              *                                       
auth.reject                             (null)                                  
transport.keepalive                     1                                       
server.allow-insecure                   (null)                                  
server.root-squash                      off                                     
server.anonuid                          65534                                   
server.anongid                          65534                                   
server.statedump-path                   /var/run/gluster                        
server.outstanding-rpc-limit            64                                      
features.lock-heal                      off                                     
features.grace-timeout                  10                                      
server.ssl                              (null)                                  
auth.ssl-allow                          *                                       
server.manage-gids                      off                                     
server.dynamic-auth                     on                                      
client.send-gids                        on                                      
server.gid-timeout                      300                                     
server.own-thread                       (null)                                  
server.event-threads                    1                                       
server.tcp-user-timeout                 0                                       
server.keepalive-time                   20                                      
server.keepalive-interval               2                                       
server.keepalive-count                  9                                       
transport.listen-backlog                10                                      
ssl.own-cert                            (null)                                  
ssl.private-key                         (null)                                  
ssl.ca-list                             (null)                                  
ssl.crl-path                            (null)                                  
ssl.certificate-depth                   (null)                                  
ssl.cipher-list                         (null)                                  
ssl.dh-param                            (null)                                  
ssl.ec-curve                            (null)                                  
performance.write-behind                on                                      
performance.read-ahead                  on                                      
performance.readdir-ahead               on                                      
performance.io-cache                    on                                      
performance.quick-read                  on                                      
performance.open-behind                 on                                      
performance.nl-cache                    off                                     
performance.stat-prefetch               on                                      
performance.client-io-threads           on                                      
performance.nfs.write-behind            on                                      
performance.nfs.read-ahead              off                                     
performance.nfs.io-cache                off                                     
performance.nfs.quick-read              off                                     
performance.nfs.stat-prefetch           off                                     
performance.nfs.io-threads              off                                     
performance.force-readdirp              true                                    
performance.cache-invalidation          false                                   
features.uss                            off                                     
features.snapshot-directory             .snaps                                  
features.show-snapshot-directory        off                                     
network.compression                     off                                     
network.compression.window-size         -15                                     
network.compression.mem-level           8                                       
network.compression.min-size            0                                       
network.compression.compression-level   -1                                      
network.compression.debug               false                                   
features.limit-usage                    (null)                                  
features.default-soft-limit             80%                                     
features.soft-timeout                   60                                      
features.hard-timeout                   5                                       
features.alert-time                     86400                                   
features.quota-deem-statfs              off                                     
geo-replication.indexing                off                                     
geo-replication.indexing                off                                     
geo-replication.ignore-pid-check        off                                     
geo-replication.ignore-pid-check        off                                     
features.quota                          off                                     
features.inode-quota                    off                                     
features.bitrot                         disable                                 
debug.trace                             off                                     
debug.log-history                       no                                      
debug.log-file                          no                                      
debug.exclude-ops                       (null)                                  
debug.include-ops                       (null)                                  
debug.error-gen                         off                                     
debug.error-failure                     (null)                                  
debug.error-number                      (null)                                  
debug.random-failure                    off                                     
debug.error-fops                        (null)                                  
nfs.disable                             off                                     
features.read-only                      off                                     
features.worm                           off                                     
features.worm-file-level                off                                     
features.default-retention-period       120                                     
features.retention-mode                 relax                                   
features.auto-commit-period             180                                     
storage.linux-aio                       off                                     
storage.batch-fsync-mode                reverse-fsync                           
storage.batch-fsync-delay-usec          0                                       
storage.owner-uid                       -1                                      
storage.owner-gid                       -1                                      
storage.node-uuid-pathinfo              off                                     
storage.health-check-interval           30                                      
storage.build-pgfid                     off                                     
storage.gfid2path                       on                                      
storage.gfid2path-separator             :                                       
storage.bd-aio                          off                                     
cluster.server-quorum-type              off                                     
cluster.server-quorum-ratio             0                                       
changelog.changelog                     off                                     
changelog.changelog-dir                 (null)                                  
changelog.encoding                      ascii                                   
changelog.rollover-time                 15                                      
changelog.fsync-interval                5                                       
changelog.changelog-barrier-timeout     120                                     
changelog.capture-del-path              off                                     
features.barrier                        disable                                 
features.barrier-timeout                120                                     
features.trash                          off                                     
features.trash-dir                      .trashcan                               
features.trash-eliminate-path           (null)                                  
features.trash-max-filesize             5MB                                     
features.trash-internal-op              off                                     
cluster.enable-shared-storage           disable                                 
cluster.write-freq-threshold            0                                       
cluster.read-freq-threshold             0                                       
cluster.tier-pause                      off                                     
cluster.tier-promote-frequency          120                                     
cluster.tier-demote-frequency           3600                                    
cluster.watermark-hi                    90                                      
cluster.watermark-low                   75                                      
cluster.tier-mode                       cache                                   
cluster.tier-max-promote-file-size      0                                       
cluster.tier-max-mb                     4000                                    
cluster.tier-max-files                  10000                                   
cluster.tier-query-limit                100                                     
cluster.tier-compact                    on                                      
cluster.tier-hot-compact-frequency      604800                                  
cluster.tier-cold-compact-frequency     604800                                  
features.ctr-enabled                    off                                     
features.record-counters                off                                     
features.ctr-record-metadata-heat       off                                     
features.ctr_link_consistency           off                                     
features.ctr_lookupheal_link_timeout    300                                     
features.ctr_lookupheal_inode_timeout   300                                     
features.ctr-sql-db-cachesize           12500                                   
features.ctr-sql-db-wal-autocheckpoint  25000                                   
features.selinux                        on                                      
locks.trace                             off                                     
locks.mandatory-locking                 off                                     
cluster.disperse-self-heal-daemon       enable                                  
cluster.quorum-reads                    no                                      
client.bind-insecure                    (null)                                  
features.shard                          off                                     
features.shard-block-size               64MB                                    
features.scrub-throttle                 lazy                                    
features.scrub-freq                     biweekly                                
features.scrub                          false                                   
features.expiry-time                    120                                     
features.cache-invalidation             off                                     
features.cache-invalidation-timeout     60                                      
features.leases                         off                                     
features.lease-lock-recall-timeout      60                                      
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
dht.force-readdirp                      on                                      
disperse.read-policy                    gfid-hash                               
cluster.shd-max-threads                 1                                       
cluster.shd-wait-qlength                1024                                    
cluster.locking-scheme                  full                                    
cluster.granular-entry-heal             no                                      
features.locks-revocation-secs          0                                       
features.locks-revocation-clear-all     false                                   
features.locks-revocation-max-blocked   0                                       
features.locks-monkey-unlocking         false                                   
disperse.shd-max-threads                1                                       
disperse.shd-wait-qlength               1024                                    
disperse.cpu-extensions                 auto                                    
disperse.self-heal-window-size          1                                       
cluster.use-compound-fops               off                                     
performance.parallel-readdir            off                                     
performance.rda-request-size            131072                                  
performance.rda-low-wmark               4096                                    
performance.rda-high-wmark              128KB                                   
performance.rda-cache-limit             10MB                                    
performance.nl-cache-positive-entry     false                                   
performance.nl-cache-limit              10MB                                    
performance.nl-cache-timeout            60                                      
cluster.brick-multiplex                 off                                     
cluster.max-bricks-per-process          0                                       
disperse.optimistic-change-log          on                                      
cluster.halo-enabled                    False                                   
cluster.halo-shd-max-latency            99999                                   
cluster.halo-nfsd-max-latency           5                                       
cluster.halo-max-latency                5                                       
cluster.halo-max-replicas               99999                                   
cluster.halo-min-replicas               2    

On Thu, 2018-06-14 at 12:12 +0100, mohammad kashif wrote:
Hi Nithya

It seems that problem can be solved by either turning parallel-readir off or downgrading client to 3.10.12-1 . Yesterday I downgraded some clients to 3.10.12-1 and it seems to fixed the problem. Today when I saw your email then I disabled parallel-readir off and the current client 3.12.9-1 started  to work.   I upgraded server and clients to 3.12.9-1 last month and since then clients were intermittently unmounting once in a week. But during last three days, it started unmounting every few minutes. I don't know that what triggered this sudden panic except that file system was quite full; around 98%. It is 480 TB file system. The file system has almost 80 Million files.

Servers have 64GB RAM and clients have 64GB to 192GB RAM. I tested with 192GB RAM client and it still had the same issue.    

Volume Name: atlasglust
Type: Distribute
Volume ID: fbf0ebb8-deab-4388-9d8a-f722618a624b
Status: Started
Snapshot Count: 0
Number of Bricks: 7
Transport-type: tcp
Bricks:
Brick1: pplxgluster01.X.Y.Z/glusteratlas/brick001/gv0
Brick2: pplxgluster02.X.Y.Z:/glusteratlas/brick002/gv0
Brick3: pplxgluster03.X.Y.Z:/glusteratlas/brick003/gv0
Brick4: pplxgluster04.X.Y.Z:/glusteratlas/brick004/gv0
Brick5: pplxgluster05.X.Y.Z:/glusteratlas/brick005/gv0
Brick6: pplxgluster06.X.Y.Z:/glusteratlas/brick006/gv0
Brick7: pplxgluster07.X.Y.Z:/glusteratlas/brick007/gv0
Options Reconfigured:
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
performance.cache-invalidation: on
server.event-threads: 4
client.event-threads: 4
cluster.lookup-optimize: on
performance.client-io-threads: on
performance.cache-size: 1GB
performance.parallel-readdir: off
performance.md-cache-timeout: 600
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
auth.allow: X.Y.Z.*
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

Thanks

Kashif

On Thu, Jun 14, 2018 at 5:39 AM, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
+Poornima who works on parallel-readdir. 
@Poornima, Have you seen anything like this before?

On 14 June 2018 at 10:07, Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
This is not the same issue as the one you are referring - that was in the RPC layer and caused the bricks to crash. This one is different as it seems to be in the dht and rda layers. It does look like a stack overflow though.
@Mohammad,

Please send the following information:

1. gluster volume info 
2. The number of entries in the directory being listed
3. System memory

Does this still happen if you turn off parallel-readdir?

Regards,
Nithya

On 13 June 2018 at 16:40, Milind Changire <mchangir@xxxxxxxxxx> wrote:
+Nithya

Nithya,
Do these logs [1]  look similar to the recursive readdir() issue that you encountered just a while back ?
i.e. recursive readdir() response definition in the XDR

[1] http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log

On Wed, Jun 13, 2018 at 4:29 PM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Milind

Thanks a lot, I manage to run gdb and produced traceback as well. Its here

http://www-pnp.physics.ox.ac.uk/~mohammad/backtrace.log 

I am trying to understand but still not able to make sense out of it.

Thanks

Kashif

On Wed, Jun 13, 2018 at 11:34 AM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
Kashif,
FYI: http://debuginfo.centos.org/centos/6/storage/x86_64/

On Wed, Jun 13, 2018 at 3:21 PM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Milind 

There is no 
glusterfs-debuginfo available for gluster-3.12 from 
http://mirror.centos.org/centos/6/storage/x86_64/gluster-3.12/ repo. Do 
you know from where I can get it? 
Also when I run gdb, it says 

Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.12.9-1.el6.x86_64 

I can't find debug package for glusterfs-fuse either

Thanks from the pit of despair ;)

Kashif

On Tue, Jun 12, 2018 at 5:01 PM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Milind

I will send you links for logs.

I collected these core dumps at client and there is no glusterd process running on client.

Kashif

On Tue, Jun 12, 2018 at 4:14 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
Kashif,
Could you also send over the client/mount log file as Vijay suggested ?
Or maybe the lines with the crash backtrace lines

Also, you've mentioned that you straced glusterd, but when you ran gdb, you ran it over /usr/sbin/glusterfs

On Tue, Jun 12, 2018 at 8:19 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:

On Tue, Jun 12, 2018 at 7:40 AM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Milind

The operating system is Scientific Linux 6 which is based on RHEL6. The cpu arch is Intel x86_64.

I will send you a separate email with link to core dump.

You could also grep for crash in the client log file and the lines following crash would have a backtrace in most cases.

HTH,
Vijay

Thanks for your help.

Kashif

On Tue, Jun 12, 2018 at 3:16 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
Kashif,
Could you share the core dump via Google Drive or something similar

Also, let me know the CPU arch and OS Distribution on which you are running gluster.

If you've installed the glusterfs-debuginfo package, you'll also get the source lines in the backtrace via gdb

On Tue, Jun 12, 2018 at 5:59 PM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Milind, Vijay 

Thanks, I have some more information now as I straced glusterd on client

138544      0.000131 mprotect(0x7f2f70785000, 4096, PROT_READ|PROT_WRITE) = 0 <0.000026>
138544      0.000128 mprotect(0x7f2f70786000, 4096, PROT_READ|PROT_WRITE) = 0 <0.000027>
138544      0.000126 mprotect(0x7f2f70787000, 4096, PROT_READ|PROT_WRITE) = 0 <0.000027>
138544      0.000124 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7f2f7c60ef88} ---
138544      0.000051 --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
138551      0.105048 +++ killed by SIGSEGV (core dumped) +++
138550      0.000041 +++ killed by SIGSEGV (core dumped) +++
138547      0.000008 +++ killed by SIGSEGV (core dumped) +++
138546      0.000007 +++ killed by SIGSEGV (core dumped) +++
138545      0.000007 +++ killed by SIGSEGV (core dumped) +++
138544      0.000008 +++ killed by SIGSEGV (core dumped) +++
138543      0.000007 +++ killed by SIGSEGV (core dumped) +++

As for I understand that somehow gluster is trying to access memory in appropriate manner and kernel sends SIGSEGV 

I also got the core dump. I am trying gdb first time so I am not sure whether I am using it correctly 

gdb /usr/sbin/glusterfs core.138536

It just tell me that program terminated with signal 11, segmentation fault .

The problem is not limited to one client but happening to many clients. 

I will really appreciate any help as whole file system has become unusable 

Thanks

Kashif

On Tue, Jun 12, 2018 at 12:26 PM, Milind Changire <mchangir@xxxxxxxxxx> wrote:
Kashif,
You can change the log level by:
$ gluster volume set <vol> diagnostics.brick-log-level TRACE
$ gluster volume set <vol> diagnostics.client-log-level TRACE

and see how things fare

If you want fewer logs you can change the log-level to DEBUG instead of TRACE.

On Tue, Jun 12, 2018 at 3:37 PM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi Vijay

Now it is unmounting every 30 mins ! 

The server log at /var/log/glusterfs/bricks/glusteratlas-brics001-gv0.log have this line only

2018-06-12 09:53:19.303102] I [MSGID: 115013] [server-helpers.c:289:do_fd_cleanup] 0-atlasglust-server: fd cleanup on /atlas/atlasdata/zgubic/hmumu/histograms/v14.3/Signal
[2018-06-12 09:53:19.306190] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-atlasglust-server: Shutting down connection <server-name> -2224879-2018/06/12-09:51:01:460889-atlasglust-client-0-0-0

There is no other information. Is there any way to increase log verbosity?

on the client 

2018-06-12 09:51:01.744980] I [MSGID: 114057] [client-handshake.c:1478:select_server_supported_programs] 0-atlasglust-client-5: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-06-12 09:51:01.746508] I [MSGID: 114046] [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-5: Connected to atlasglust-client-5, attached to remote volume '/glusteratlas/brick006/gv0'.
[2018-06-12 09:51:01.746543] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-5: Server and Client lk-version numbers are not same, reopening the fds
[2018-06-12 09:51:01.746814] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-atlasglust-client-5: Server lk version = 1
[2018-06-12 09:51:01.748449] I [MSGID: 114057] [client-handshake.c:1478:select_server_supported_programs] 0-atlasglust-client-6: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2018-06-12 09:51:01.750219] I [MSGID: 114046] [client-handshake.c:1231:client_setvolume_cbk] 0-atlasglust-client-6: Connected to atlasglust-client-6, attached to remote volume '/glusteratlas/brick007/gv0'.
[2018-06-12 09:51:01.750261] I [MSGID: 114047] [client-handshake.c:1242:client_setvolume_cbk] 0-atlasglust-client-6: Server and Client lk-version numbers are not same, reopening the fds
[2018-06-12 09:51:01.750503] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-atlasglust-client-6: Server lk version = 1
[2018-06-12 09:51:01.752207] I [fuse-bridge.c:4205:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.14
[2018-06-12 09:51:01.752261] I [fuse-bridge.c:4835:fuse_graph_sync] 0-fuse: switched to graph 0

is there a problem with server and client 1k version?

Thanks for your help.

Kashif

On Mon, Jun 11, 2018 at 11:52 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:

On Mon, Jun 11, 2018 at 8:50 AM, mohammad kashif <kashif.alig@xxxxxxxxx> wrote:
Hi

Since I have updated our gluster server and client to latest version 3.12.9-1, I am having this issue of gluster getting unmounted from client very regularly. It was not a problem before update.

Its a distributed file system with no replication. We have seven servers totaling around 480TB data. Its 97% full. 

I am using following config on server

gluster volume set atlasglust features.cache-invalidation on
gluster volume set atlasglust features.cache-invalidation-timeout 600
gluster volume set atlasglust performance.stat-prefetch on
gluster volume set atlasglust performance.cache-invalidation on
gluster volume set atlasglust performance.md-cache-timeout 600
gluster volume set atlasglust performance.parallel-readdir on
gluster volume set atlasglust performance.cache-size 1GB
gluster volume set atlasglust performance.client-io-threads on
gluster volume set atlasglust cluster.lookup-optimize on
gluster volume set atlasglust performance.stat-prefetch on
gluster volume set atlasglust client.event-threads 4
gluster volume set atlasglust server.event-threads 4

clients are mounted with this option

defaults,direct-io-mode=disable,attribute-timeout=600,entry-timeout=600,negative-timeout=600,fopen-keep-cache,rw,_netdev 

I can't see anything in the log file. Can someone suggest that how to troubleshoot this issue?

Can you please share the log file? Checking for messages related to disconnections/crashes in the log file would be a good way to start troubleshooting the problem.

Thanks,
Vijay 

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users
-- 
James P. Kinney III

Every time you stop a school, you will have to build a jail. What you
gain at one end you lose at the other. It's like feeding a dog on his
own tail. It won't fatten the dog.
- Speech 11/23/1900 Mark Twain

http://heretothereideas.blogspot.com/
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users