Re: Regression health for release-5.next and release-6

Xavi Hernandez <jahernan@xxxxxxxxxx> · Thu, 17 Jan 2019 09:21:47 +0100

On Thu, Jan 17, 2019 at 5:29 AM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

On Tue, Jan 15, 2019 at 2:13 PM Atin Mukherjee <atin.mukherjee83@xxxxxxxxx> wrote:
Interesting. I’ll do a deep dive at it sometime this week.

On Tue, 15 Jan 2019 at 14:05, Xavi Hernandez <jahernan@xxxxxxxxxx> wrote:
On Mon, Jan 14, 2019 at 11:08 AM Ashish Pandey <aspandey@xxxxxxxxxx> wrote:

I downloaded logs of regression runs 1077 and 1073 and tried to investigate it.
In both regression ec/bug-1236065.t is hanging on TEST 70  which is trying to get the online brick count

I can see that in mount/bricks and glusterd logs it has not move forward after this test.
glusterd.log  - 

[2019-01-06 16:27:51.346408]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count ++++++++++
[2019-01-06 16:27:51.645014] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy
[2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) [0x7f4c37fe06c3] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) [0x7f4c37fd9b3a] -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string type [Invalid argument]
[2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.649335] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-06 16:27:51.932871] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy

It is just taking lot of time to get the status at this point.
 It looks like there could be some issue with connection or the handing of volume status when some bricks are down.

The 'online_brick_count' check uses 'gluster volume status' to get some information, and it does that several times (currently 7). Looking at cmd_history.log, I see that after the 'online_brick_count' at line 70, only one 'gluster volume status' has completed. Apparently the second 'gluster volume status' is hung.

In cli.log I see that the second 'gluster volume status' seems to have started, but not finished:

Normal run:[2019-01-08 16:36:43.628821] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev

[2019-01-08 16:36:43.808182] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0

[2019-01-08 16:36:43.808287] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

[2019-01-08 16:36:43.808432] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

[2019-01-08 16:36:43.816534] I [dict.c:1947:dict_get_uint32] (-->gluster(cli_cmd_process+0x1e4) [0x40db50] -->gluster(cli_cmd_volume_status_cbk+0x90) [0x415bec] -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7fefe569456
9] ) 0-dict: key cmd, unsigned integer type asked, has integer type [Invalid argument]

[2019-01-08 16:36:43.816716] I [dict.c:1947:dict_get_uint32] (-->gluster(cli_cmd_volume_status_cbk+0x1cb) [0x415d27] -->gluster(gf_cli_status_volume_all+0xc8) [0x42fa94] -->/build/install/lib/libglusterfs.so.0(dict_get_uint32+0x176) [0x7f
efe5694569] ) 0-dict: key cmd, unsigned integer type asked, has integer type [Invalid argument]

[2019-01-08 16:36:43.824437] I [input.c:31:cli_batch] 0-: Exiting with: 0

Bad run:[2019-01-08 16:36:43.940361] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev
[2019-01-08 16:36:44.147364] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-01-08 16:36:44.147477] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-08 16:36:44.147583] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler

In glusterd.log it seems as if it hasn't received any status request. It looks like the cli has not even connected to glusterd.

Downloaded the logs for the recent failure from https://build.gluster.org/job/regression-test-with-multiplex/1092/ and based on the log scanning this is what I see:

1. The test executes with out any issues till line no 74 i.e. "TEST $CLI volume start $V0 force" and cli.log along with cmd_history.log confirm the same:

cli.log
====
[2019-01-16 16:28:46.871877]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 73 gluster --mode=script --wignore volume start patchy force ++++++++++
[2019-01-16 16:28:46.980780] I [cli.c:834:main] 0-cli: Started running gluster with version 6dev
[2019-01-16 16:28:47.185996] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-01-16 16:28:47.186113] I [MSGID: 101190] [event-epoll.c:675:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-01-16 16:28:47.186234] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-16 16:28:49.223376] I [cli-rpc-ops.c:1448:gf_cli_start_volume_cbk] 0-cli: Received resp to start volume <=== successfully processed the callback
[2019-01-16 16:28:49.223668] I [input.c:31:cli_batch] 0-: Exiting with: 0  

cmd_history.log
============
[2019-01-16 16:28:49.220491]  : volume start patchy force : SUCCESS

However, in both cli and cmd_history log files these are the last set of logs I see which indicates either the test script is completely paused. There's no possibility I see that cli receiving this command and dropping it completely as otherwise we should have atleast seen the "Started running gluster with version 6dev" and "Exiting with" log entries.

I could manage to reproduce this once locally in my system and then when I ran command from another prompt, volume status and all other gluster basic commands go through. I also inspected the processes and I don't see any suspect of processes being hung. 

So the mystery continues and we need to see why the test script is not all moving forward.

An additional thing that could be interesting: in all cases I've seen this test to hang, the next test shows an error during cleanup:

Aborting.

/mnt/nfs/1 could not be deleted, here are the left over items
drwxr-xr-x. 2 root root 6 Jan 16 16:41 /d/backends
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/0
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/1
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/2
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/glusterfs/3
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/nfs/0
drwxr-xr-x. 2 root root 4096 Jan 16 16:41 /mnt/nfs/1

Please correct the problem and try again.
This is a bit weird, since this only happens after having removed all these directories with an 'rm -rf', and this command doesn't exit on the first error, so at least some of these directories should have been removed, even is the mount process is hung (all nfs mounts and fuse mounts 1, 2 and 3 are not used by the test). The only explanation I have is that the cleanup function is being executed twice concurrently (probably from two different scripts). The first cleanup is blocked (or is taking a lot of time) removing one of the directories. Meantime the other cleanup has completed and recreated the directories, so when the first one finally finishes, it finds all directories still there, writing the above messages. This would also mean that something is not properly killed between tests. Not sure if that's possible.
This could match with your findings, since some commands executed on the second script could "unblock" whatever is blocked in the first one, causing it to progress and show the final error.

Could this explain something ?

Xavi

---
Ashish

From: "Mohit Agrawal" <moagrawa@xxxxxxxxxx>
To: "Shyam Ranganathan" <srangana@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Saturday, January 12, 2019 6:46:20 PM
Subject: Re:  Regression health for release-5.next and        release-6

Previous logs related to client not bricks, below are the brick logs

[2019-01-12 12:25:25.893485]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++
The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.size' would not be sent on wire in the future [Invalid argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and [2019-01-12 12:25:25.899532]
[2019-01-12 12:25:25.903375] E [MSGID: 113001] [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] 8-patchy-posix: fgetxattr failed on gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor]
[2019-01-12 12:25:25.903468] E [MSGID: 115073] [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, error-xlator: patchy-posix [Bad file descriptor]

Thanks,
Mohit Agrawal

On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:

For specific to "add-brick-and-validate-replicated-volume-options.t" i have posted a patch https://review.gluster.org/22015.
For test case "ec/bug-1236065.t" I think the issue needs to be check by ec team

On the brick side, it is showing below logs 

>>>>>>>>>>>>>>>>>

on wire in the future [Invalid argument]
The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and [2019-01-12 12:25:25.902992]
[2019-01-12 12:25:25.903553] W [MSGID: 114031] [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: remote operation failed [Bad file descriptor]
[2019-01-12 12:25:25.903998] W [MSGID: 122040] [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get size and version :  FOP : 'FXATTROP' failed on gfid d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error]
[2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error)

>>>>>>>>>>>>>>>>>>>

Test case is getting timed out because "volume heal $V0 full" command is stuck, look's like shd is getting stuck at getxattr

>>>>>>>>>>>>>>.

Thread 8 (Thread 0x7f83777fe700 (LWP 25552)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83777fdbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, child=<optimized out>, loc=0x7f83777fdbb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, entry=<optimized out>, parent=0x7f83777fdde0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc@entry=0x7f83777fdde0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030880, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8376ffcbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, child=<optimized out>, loc=0x7f8376ffcbb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, entry=<optimized out>, parent=0x7f8376ffcde0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc@entry=0x7f8376ffcde0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a80308f0, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 6 (Thread 0x7f83767fc700 (LWP 25554)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83767fbbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, child=<optimized out>, loc=0x7f83767fbbb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, entry=<optimized out>, parent=0x7f83767fbde0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc@entry=0x7f83767fbde0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030960, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8375ffabb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, child=<optimized out>, loc=0x7f8375ffabb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, entry=<optimized out>, parent=0x7f8375ffade0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc@entry=0x7f8375ffade0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a80309d0, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7f83757fa700 (LWP 25556)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83757f9bb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, child=<optimized out>, loc=0x7f83757f9bb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, entry=<optimized out>, parent=0x7f83757f9de0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc@entry=0x7f83757f9de0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030a40, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8374ff8bb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, child=<optimized out>, loc=0x7f8374ff8bb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, entry=<optimized out>, parent=0x7f8374ff8de0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc@entry=0x7f8374ff8de0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030ab0, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 2 (Thread 0x7f8367fff700 (LWP 25558)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8367ffebb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, child=<optimized out>, loc=0x7f8367ffebb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, entry=<optimized out>, parent=0x7f8367ffede0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc@entry=0x7f8367ffede0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030b20, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)):
#0  0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc92eff8 in event_dispatch_epoll (event_pool=0x55af0a6dd560) at event-epoll.c:846
#2  0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at glusterfsd.c:2848

>>>>>>>>>>>>>>>>>>>>>>>>>>.

Thanks,
Mohit Agrawal

On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan <srangana@xxxxxxxxxx wrote:
We can check health on master post the patch as stated by Mohit below.

Release-5 is causing some concerns as we need to tag the release

yesterday, but we have the following 2 tests failing or coredumping

pretty regularly, need attention on these.

ec/bug-1236065.t

glusterd/add-brick-and-validate-replicated-volume-options.t

Shyam

On 1/10/19 6:20 AM, Mohit Agrawal wrote:

> I think we should consider regression-builds after merged the patch

> (https://review.gluster.org/#/c/glusterfs/+/21990/) 

> as we know this patch introduced some delay.

> 

> Thanks,

> Mohit Agrawal

> 

> On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee <amukherj@xxxxxxxxxx

> <mailto:amukherj@xxxxxxxxxx>> wrote:

> 

>     Mohit, Sanju - request you to investigate the failures related to

>     glusterd and brick-mux and report back to the list.

> 

>     On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan

>     <srangana@xxxxxxxxxx <mailto:srangana@xxxxxxxxxx>> wrote:

> 

>         Hi,

> 

>         As part of branching preparation next week for release-6, please

>         find

>         test failures and respective test links here [1].

> 

>         The top tests that are failing/dumping-core are as below and

>         need attention,

>         - ec/bug-1236065.t

>         - glusterd/add-brick-and-validate-replicated-volume-options.t

>         - readdir-ahead/bug-1390050.t

>         - glusterd/brick-mux-validation.t

>         - bug-1432542-mpx-restart-crash.t

> 

>         Others of interest,

>         - replicate/bug-1341650.t

> 

>         Please file a bug if needed against the test case and report the

>         same

>         here, in case a problem is already addressed, then do send back the

>         patch details that addresses this issue as a response to this mail.

> 

>         Thanks,

>         Shyam

> 

>         [1] Regression failures:

>         https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view

>         _______________________________________________

>         Gluster-devel mailing list

>         Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>

>         https://lists.gluster.org/mailman/listinfo/gluster-devel

> 

> 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-devel
-- 
--Atin
_______________________________________________

Gluster-devel mailing list

Gluster-devel@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel