Within 24 hours after updating from rock solid 4.1 to 5.3, I already got a crash which others have mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=1313567 and had to unmount, kill gluster, and remount:
[2019-01-31 09:38:04.317604] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.319308] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320047] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
[2019-01-31 09:38:04.320677] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fcccafcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fcccb1deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fccd705b218] ) 2-dict: dict is NULL [Invalid argument]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-3" repeated 5 times between [2019-01-31 09:37:54.751905] and [2019-01-31 09:38:03.958061]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 72 times between [2019-01-31 09:37:53.746741] and [2019-01-31 09:38:04.696993]
pending frames:
frame : type(1) op(READ)
frame : type(1) op(OPEN)
frame : type(0) op(0)
patchset: git://git.gluster.org/glusterfs.git
signal received: 6
time of crash:
2019-01-31 09:38:04
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 5.3
/usr/lib64/libglusterfs.so.0(+0x2764c)[0x7fccd706664c]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x306)[0x7fccd7070cb6]
/lib64/libc.so.6(+0x36160)[0x7fccd622d160]
/lib64/libc.so.6(gsignal+0x110)[0x7fccd622d0e0]
/lib64/libc.so.6(abort+0x151)[0x7fccd622e6c1]
/lib64/libc.so.6(+0x2e6fa)[0x7fccd62256fa]
/lib64/libc.so.6(+0x2e772)[0x7fccd6225772]
/lib64/libpthread.so.0(pthread_mutex_lock+0x228)[0x7fccd65bb0b8]
/usr/lib64/glusterfs/5.3/xlator/cluster/replicate.so(+0x32c4d)[0x7fcccbb01c4d]
/usr/lib64/glusterfs/5.3/xlator/protocol/client.so(+0x65778)[0x7fcccbdd1778]
/usr/lib64/libgfrpc.so.0(+0xe820)[0x7fccd6e31820]
/usr/lib64/libgfrpc.so.0(+0xeb6f)[0x7fccd6e31b6f]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fccd6e2e063]
/usr/lib64/glusterfs/5.3/rpc-transport/socket.so(+0xa0b2)[0x7fccd0b7e0b2]
/usr/lib64/libglusterfs.so.0(+0x854c3)[0x7fccd70c44c3]
/lib64/libpthread.so.0(+0x7559)[0x7fccd65b8559]
/lib64/libc.so.6(clone+0x3f)[0x7fccd62ef81f]
---------
Do the pending patches fix the crash or only the repeated warnings? I'm running glusterfs on OpenSUSE 15.0 installed via http://download.opensuse.org/repositories/home:/glusterfs:/Leap15-5/openSUSE_Leap_15.0/, not too sure how to make it core dump.
If it's not fixed by the patches above, has anyone already opened a ticket for the crashes that I can join and monitor? This is going to create a massive problem for us since production systems are crashing.
Thanks.
On Wed, Jan 30, 2019 at 6:37 PM Raghavendra Gowdappa <rgowdapp@xxxxxxxxxx> wrote:
On Thu, Jan 31, 2019 at 2:14 AM Artem Russakovskii <archon810@xxxxxxxxx> wrote:Also, not sure if related or not, but I got a ton of these "Failed to dispatch handler" in my logs as well. Many people have been commenting about this issue here https://bugzilla.redhat.com/show_bug.cgi?id=1651246.https://review.gluster.org/#/c/glusterfs/+/22046/ addresses this.==> mnt-SITE_data1.log <==
[2019-01-30 20:38:20.783713] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]
==> mnt-SITE_data3.log <==
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 413 times between [2019-01-30 20:36:23.881090] and [2019-01-30 20:38:20.015593]
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0" repeated 42 times between [2019-01-30 20:36:23.290287] and [2019-01-30 20:38:20.280306]
==> mnt-SITE_data1.log <==
The message "I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0" repeated 50 times between [2019-01-30 20:36:22.247367] and [2019-01-30 20:38:19.459789]
The message "E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handler" repeated 2654 times between [2019-01-30 20:36:22.667327] and [2019-01-30 20:38:20.546355]
[2019-01-30 20:38:21.492319] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data1-replicate-0: selecting local read_child SITE_data1-client-0
==> mnt-SITE_data3.log <==
[2019-01-30 20:38:22.349689] I [MSGID: 108031] [afr-common.c:2543:afr_local_discovery_cbk] 2-SITE_data3-replicate-0: selecting local read_child SITE_data3-client-0
==> mnt-SITE_data1.log <==
[2019-01-30 20:38:22.762941] E [MSGID: 101191] [event-epoll.c:671:event_dispatch_epoll_worker] 2-epoll: Failed to dispatch handlerI'm hoping raising the issue here on the mailing list may bring some additional eyeballs and get them both fixed.Thanks.On Wed, Jan 30, 2019 at 12:26 PM Artem Russakovskii <archon810@xxxxxxxxx> wrote:I found a similar issue here: https://bugzilla.redhat.com/show_bug.cgi?id=1313567. There's a comment from 3 days ago from someone else with 5.3 who started seeing the spam.Here's the command that repeats over and over:[2019-01-30 20:23:24.481581] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fd966fcd329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fd9671deaf5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fd9731ea218] ) 2-dict: dict is NULL [Invalid argument]+Milind Changire Can you check why this message is logged and send a fix?_______________________________________________Is there any fix for this issue?Thanks.
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users