Re: Message repeated over and over after upgrade from 4.1 to 5.3: W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> · Tue, 12 Feb 2019 11:04:22 +0530

On Tue, Feb 12, 2019 at 10:24 AM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Great job identifying the issue!
Any ETA on the next release with the logging and crash fixes in it?

I've marked write-behind corruption as a blocker for release-6. Logging fixes are already in codebase.

On Mon, Feb 11, 2019, 7:19 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:

On Mon, Feb 11, 2019 at 3:49 PM João Baúto <joao.bauto@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
Although I don't have these error messages, I'm having fuse crashes as frequent as you. I have disabled write-behind and the mount has been running over the weekend with heavy usage and no issues.

The issue you are facing will likely be fixed by patch [1]. Me, Xavi and Nithya were able to identify the corruption in write-behind.

[1] https://review.gluster.org/22189

I can provide coredumps before disabling write-behind if needed. I opened a BZ report with the crashes that I was having.

João Baúto
---------------
Scientific Computing and Software Platform
Champalimaud Research
Champalimaud Center for the Unknown
Av. Brasília, Doca de Pedrouços
1400-038 Lisbon, Portugal
fchampalimaud.org

Artem Russakovskii <archon810@xxxxxxxxx> escreveu no dia sábado, 9/02/2019 à(s) 22:18:
Alright. I've enabled core-dumping (hopefully), so now I'm waiting for the next crash to see if it dumps a core for you guys to remotely debug.
Then I can consider setting performance.write-behind to off and monitoring for further crashes.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Fri, Feb 8, 2019 at 7:22 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:

On Sat, Feb 9, 2019 at 12:53 AM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Hi Nithya,
I can try to disable write-behind as long as it doesn't heavily impact performance for us. Which option is it exactly? I don't see it set in my list of changed volume variables that I sent you guys earlier.

The option is performance.write-behind

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Fri, Feb 8, 2019 at 4:57 AM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
Hi Artem,
We have found the cause of one crash. Unfortunately we have not managed to reproduce the one you reported so we don't know if it is the same cause.

Can you disable write-behind on the volume and let us know if it solves the problem? If yes, it is likely to be the same issue.

regards,
Nithya

On Fri, 8 Feb 2019 at 06:51, Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Sorry to disappoint, but the crash just happened again, so lru-limit=0 didn't help.
Here's the snippet of the crash and the subsequent remount by monit.

[2019-02-08 01:13:05.854391] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7f4402b99329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7f4402daaaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7f440b6b5218] ) 0-dict: dict is NULL [In
valid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-<SNIP>_data1-replicate-0: selecting local read_child <SNIP>_data1-client-3" repeated 39 times between [2019-02-08 01:11:18.043286] and [2019-02-08 01:13:07.915604]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 515 times between [2019-02-08 01:11:17.932515] and [2019-02-08 01:13:09.311554]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash: 
2019-02-08 01:13:09
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7f440b6c064c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7f440b6cacb6]
/lib64/libc.so.6(+0x36160)[0x7f440a887160]
/lib64/libc.so.6(gsignal+0x110)[0x7f440a8870e0]
/lib64/libc.so.6(abort+0x151)[0x7f440a8886c1]
/lib64/libc.so.6(+0x2e6fa)[0x7f440a87f6fa]
/lib64/libc.so.6(+0x2e772)[0x7f440a87f772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7f440ac150b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7f44036f8c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7f440370bba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7f4403990f3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7f440b48b820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7f440b48bb6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f440b488063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7f44050a80b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7f440b71e4c3]
/lib64/libpthread.so.0(+0x7559)[0x7f440ac12559]
/lib64/libc.so.6(clone+0x3f)[0x7f440a94981f]
---------
[2019-02-08 01:13:35.628478] I [MSGID: 100030] [glusterfsd.c:2715:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 5.3 (args: /usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>_data1 /mnt/<SNIP>_data1)
[2019-02-08 01:13:35.637830] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-02-08 01:13:35.651405] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2
[2019-02-08 01:13:35.651628] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 3
[2019-02-08 01:13:35.651747] I [MSGID: 101190] [event-epoll.c:622:event_dispatch_epoll_worker] 0-epoll: Started thread with index 4
[2019-02-08 01:13:35.652575] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-0: parent translators are ready, attempting connect on transport
[2019-02-08 01:13:35.652978] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-1: parent translators are ready, attempting connect on transport
[2019-02-08 01:13:35.655197] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-2: parent translators are ready, attempting connect on transport
[2019-02-08 01:13:35.655497] I [MSGID: 114020] [client.c:2354:notify] 0-<SNIP>_data1-client-3: parent translators are ready, attempting connect on transport
[2019-02-08 01:13:35.655527] I [rpc-clnt.c:2042:rpc_clnt_reconfig] 0-<SNIP>_data1-client-0: changing port to 49153 (from 0)
Final graph:

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Thu, Feb 7, 2019 at 1:28 PM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
I've added the lru-limit=0 parameter to the mounts, and I see it's taken effect correctly:
"/usr/sbin/glusterfs --lru-limit=0 --process-name fuse --volfile-server=localhost --volfile-id=/<SNIP>  /mnt/<SNIP>"

Let's see if it stops crashing or not.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Wed, Feb 6, 2019 at 10:48 AM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Hi Nithya,

Indeed, I upgraded from 4.1 to 5.3, at which point I started seeing crashes, and no further releases have been made yet.
volume info:
Type: Replicate
Volume ID: ****SNIP****
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 4 = 4
Transport-type: tcp
Bricks:
Brick1: ****SNIP****
Brick2: ****SNIP****
Brick3: ****SNIP****
Brick4: ****SNIP****
Options Reconfigured:
cluster.quorum-count: 1
cluster.quorum-type: fixed
network.ping-timeout: 5
network.remote-dio: enable
performance.rda-cache-limit: 256MB
performance.readdir-ahead: on
performance.parallel-readdir: on
network.inode-lru-limit: 500000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
cluster.readdir-optimize: on
performance.io-thread-count: 32
server.event-threads: 4
client.event-threads: 4
performance.read-ahead: off
cluster.lookup-optimize: on
performance.cache-size: 1GB
cluster.self-heal-daemon: enable
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
cluster.granular-entry-heal: enable
cluster.data-self-heal-algorithm: full

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Wed, Feb 6, 2019 at 12:20 AM Nithya Balachandran <nbalacha@xxxxxxxxxx> wrote:
Hi Artem,
Do you still see the crashes with 5.3? If yes, please try mount the volume using the mount option lru-limit=0 and see if that helps. We are looking into the crashes and will update when have a fix.

Also, please provide the gluster volume info for the volume in question.

regards,
Nithya

On Tue, 5 Feb 2019 at 05:31, Artem Russakovskii <archon810@xxxxxxxxx> wrote:
The fuse crash happened two more times, but this time monit helped recover within 1 minute, so it's a great workaround for now.
What's odd is that the crashes are only happening on one of 4 servers, and I don't know why.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Sat, Feb 2, 2019 at 12:14 PM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
The fuse crash happened again yesterday, to another volume. Are there any mount options that could help mitigate this?
In the meantime, I set up a monit (https://mmonit.com/monit/) task to watch and restart the mount, which works and recovers the mount point within a minute. Not ideal, but a temporary workaround.

By the way, the way to reproduce this "Transport endpoint is not connected" condition for testing purposes is to kill -9 the right "glusterfs --process-name fuse" process.

monit check:
check filesystem glusterfs_data1 with path /mnt/glusterfs_data1
  start program  = "/bin/mount 

/mnt/glusterfs_data1"
  stop program  = "/bin/umount /mnt/glusterfs_data1"
  if space usage > 90% for 5 times within 15 cycles
    then alert else if succeeded for 10 cycles then alert

stack trace:
[2019-02-01 23:22:00.312894] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 23:22:00.314051] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fa0249e4329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fa024bf5af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fa02cf5b218] ) 0-dict: dict is NULL [Invalid argument]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler" repeated 26 times between [2019-02-01 23:21:20.857333] and [2019-02-01 23:21:56.164427]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 0-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-3" repeated 27 times between [2019-02-01 23:21:11.142467] and [2019-02-01 23:22:03.474036]
pending frames:
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-02-01 23:22:03
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fa02cf6664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fa02cf70cb6]
/lib64/libc.so.6(+0x36160)[0x7fa02c12d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fa02c12d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fa02c12e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fa02c1256fa]
/lib64/libc.so.6(+0x2e772)[0x7fa02c125772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fa02c4bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x5dc9d)[0x7fa025543c9d]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x70ba1)[0x7fa025556ba1]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x58f3f)[0x7fa0257dbf3f]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fa02cd31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fa02cd31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fa02cd2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fa02694e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fa02cfc44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fa02c4b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fa02c1ef81f]

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Fri, Feb 1, 2019 at 9:03 AM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Hi,
The first (and so far only) crash happened at 2am the next day after we upgraded, on only one of four servers and only to one of two mounts.

I have no idea what caused it, but yeah, we do have a pretty busy site (apkmirror.com), and it caused a disruption for any uploads or downloads from that server until I woke up and fixed the mount.

I wish I could be more helpful but all I have is that stack trace. 

I'm glad it's a blocker and will hopefully be resolved soon. 

On Thu, Jan 31, 2019, 7:26 PM Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx> wrote:
Hi Artem,
Opened https://bugzilla.redhat.com/show_bug.cgi?id=1671603 (ie, as a clone of other bugs where recent discussions happened), and marked it as a blocker for glusterfs-5.4 release.

We already have fixes for log flooding - https://review.gluster.org/22128, and are the process of identifying and fixing the issue seen with crash.

Can you please tell if the crashes happened as soon as upgrade ? or was there any particular pattern you observed before the crash.

-Amar

On Thu, Jan 31, 2019 at 11:40 PM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a crash which others have mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount, kill gluster, and remount:

[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
---------

Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, not too sure how to make it core dump.

If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.

Thanks.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:

On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
Also, not sure if related or not, but I got a ton of these "Failed to dispatch handler" in my logs as well. Many people have been commenting about this issue here https://bugzilla.redhat.com/show_bug.cgi?id=1651246.

https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.

==> mnt-SITE_data1.log <==
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
==> mnt-SITE_data3.log <==
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
==> mnt-SITE_data1.log <==
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355]
[2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0
==> mnt-SITE_data3.log <==
[2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0
==> mnt-SITE_data1.log <==
[2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler 

I'm hoping raising the issue here on the mailing list may bring some additional eyeballs and get them both fixed.

Thanks.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <archon810@xxxxxxxxx> wrote:
I found a similar issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a comment from 3 days ago from someone else with 5.3 who started seeing the spam.

Here's the command that repeats over and over:
[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]

+Milind Changire Can you check why this message is logged and send a fix?

Is there any fix for this issue?

Thanks.

Sincerely,
Artem

--
Founder, Android Police, APK Mirror, Illogical Robot LLC
beerpla.net | +ArtemRussakovskii | @ArtemR

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

-- 
Amar Tumballi (amarts)

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users