Re: Regression health for release-5.next and release-6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I downloaded logs of regression runs 1077 and 1073 and tried to investigate it.
In both regression ec/bug-1236065.t is hanging on TEST 70  which is trying to get the online brick count

I can see that in mount/bricks and glusterd logs it has not move forward after this test.
glusterd.log  -

[2019-01-06 16:27:51.346408]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 70 5 online_brick_count ++++++++++
[2019-01-06 16:27:51.645014] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy
[2019-01-06 16:27:51.646664] I [dict.c:2745:dict_get_str_boolean] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x4a6c3) [0x7f4c37fe06c3] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x43b3a) [0x7f4c37fd9b3a] -->/build/install/lib/libglusterfs.so.0(dict_get_str_boolean+0x170) [0x7f4c433d83fb] ) 0-dict: key nfs.disable, integer type asked, has string type [Invalid argument]
[2019-01-06 16:27:51.647177] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick0.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647227] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick1.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647292] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick2.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647333] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick3.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647371] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick4.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647409] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick5.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.647447] I [dict.c:2361:dict_get_strn] (-->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0xffa32) [0x7f4c38095a32] -->/build/install/lib/glusterfs/6dev/xlator/mgmt/glusterd.so(+0x474ac) [0x7f4c37fdd4ac] -->/build/install/lib/libglusterfs.so.0(dict_get_strn+0x179) [0x7f4c433d7673] ) 0-dict: key brick6.rdma_port, string type asked, has integer type [Invalid argument]
[2019-01-06 16:27:51.649335] E [MSGID: 101191] [event-epoll.c:759:event_dispatch_epoll_worker] 0-epoll: Failed to dispatch handler
[2019-01-06 16:27:51.932871] I [MSGID: 106499] [glusterd-handler.c:4404:__glusterd_handle_status_volume] 0-management: Received status volume req for volume patchy

It is just taking lot of time to get the status at this point.
It looks like there could be some issue with connection or the handing of volume status when some bricks are down.

---
Ashish




From: "Mohit Agrawal" <moagrawa@xxxxxxxxxx>
To: "Shyam Ranganathan" <srangana@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Saturday, January 12, 2019 6:46:20 PM
Subject: Re: Regression health for release-5.next and        release-6

Previous logs related to client not bricks, below are the brick logs

[2019-01-12 12:25:25.893485]:++++++++++ G_LOG:./tests/bugs/ec/bug-1236065.t: TEST: 68 rm -f 0.o 10.o 11.o 12.o 13.o 14.o 15.o 16.o 17.o 18.o 19.o 1.o 2.o 3.o 4.o 5.o 6.o 7.o 8.o 9.o ++++++++++
The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.size' would not be sent on wire in the future [Invalid argument]" repeated 199 times between [2019-01-12 12:25:25.283989] and [2019-01-12 12:25:25.899532]
[2019-01-12 12:25:25.903375] E [MSGID: 113001] [posix-inode-fd-ops.c:4617:_posix_handle_xattr_keyvalue_pair] 8-patchy-posix: fgetxattr failed on gfid=d91f6331-d394-479d-ab51-6bcf674ac3e0 while doing xattrop: Key:trusted.ec.dirty (Bad file descriptor) [Bad file descriptor]
[2019-01-12 12:25:25.903468] E [MSGID: 115073] [server-rpc-fops_v2.c:1805:server4_fxattrop_cbk] 0-patchy-server: 1486: FXATTROP 2 (d91f6331-d394-479d-ab51-6bcf674ac3e0), client: CTX_ID:b785c2b0-3453-4a03-b129-19e6ceeb5346-GRAPH_ID:0-PID:24147-HOST:softserve-moagrawa-test.1-PC_NAME:patchy-client-1-RECON_NO:-1, error-xlator: patchy-posix [Bad file descriptor]


Thanks,
Mohit Agrawal

On Sat, Jan 12, 2019 at 6:29 PM Mohit Agrawal <moagrawa@xxxxxxxxxx> wrote:

For specific to "add-brick-and-validate-replicated-volume-options.t" i have posted a patch https://review.gluster.org/22015.
For test case "ec/bug-1236065.t" I think the issue needs to be check by ec team

On the brick side, it is showing below logs 

>>>>>>>>>>>>>>>>>

on wire in the future [Invalid argument]
The message "I [MSGID: 101016] [glusterfs3.h:746:dict_to_xdr] 0-dict: key 'trusted.ec.dirty' would not be sent on wire in the future [Invalid argument]" repeated 3 times between [2019-01-12 12:25:25.902828] and [2019-01-12 12:25:25.902992]
[2019-01-12 12:25:25.903553] W [MSGID: 114031] [client-rpc-fops_v2.c:1614:client4_0_fxattrop_cbk] 0-patchy-client-1: remote operation failed [Bad file descriptor]
[2019-01-12 12:25:25.903998] W [MSGID: 122040] [ec-common.c:1181:ec_prepare_update_cbk] 0-patchy-disperse-0: Failed to get size and version :  FOP : 'FXATTROP' failed on gfid d91f6331-d394-479d-ab51-6bcf674ac3e0 [Input/output error]
[2019-01-12 12:25:25.904059] W [fuse-bridge.c:1907:fuse_unlink_cbk] 0-glusterfs-fuse: 3259: UNLINK() /test/0.o => -1 (Input/output error)

>>>>>>>>>>>>>>>>>>>

Test case is getting timed out because "volume heal $V0 full" command is stuck, look's like shd is getting stuck at getxattr

>>>>>>>>>>>>>>.

Thread 8 (Thread 0x7f83777fe700 (LWP 25552)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83777fdbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030880, child=<optimized out>, loc=0x7f83777fdbb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80094b0, entry=<optimized out>, parent=0x7f83777fdde0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80094b0, loc=loc@entry=0x7f83777fdde0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030880, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 7 (Thread 0x7f8376ffd700 (LWP 25553)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8376ffcbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80308f0, child=<optimized out>, loc=0x7f8376ffcbb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a800d110, entry=<optimized out>, parent=0x7f8376ffcde0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a800d110, loc=loc@entry=0x7f8376ffcde0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a80308f0, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 6 (Thread 0x7f83767fc700 (LWP 25554)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83767fbbb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030960, child=<optimized out>, loc=0x7f83767fbbb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8010af0, entry=<optimized out>, parent=0x7f83767fbde0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8010af0, loc=loc@entry=0x7f83767fbde0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030960, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 5 (Thread 0x7f8375ffb700 (LWP 25555)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8375ffabb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a80309d0, child=<optimized out>, loc=0x7f8375ffabb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a80144d0, entry=<optimized out>, parent=0x7f8375ffade0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a80144d0, loc=loc@entry=0x7f8375ffade0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a80309d0, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 4 (Thread 0x7f83757fa700 (LWP 25556)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f83757f9bb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030a40, child=<optimized out>, loc=0x7f83757f9bb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a8017eb0, entry=<optimized out>, parent=0x7f83757f9de0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a8017eb0, loc=loc@entry=0x7f83757f9de0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030a40, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 3 (Thread 0x7f8374ff9700 (LWP 25557)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8374ff8bb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030ab0, child=<optimized out>, loc=0x7f8374ff8bb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801b890, entry=<optimized out>, parent=0x7f8374ff8de0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801b890, loc=loc@entry=0x7f8374ff8de0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030ab0, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 2 (Thread 0x7f8367fff700 (LWP 25558)):
#0  0x00007f83bb70d945 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc910e5b in syncop_getxattr (subvol=<optimized out>, loc=loc@entry=0x7f8367ffebb0, dict=dict@entry=0x0, key=key@entry=0x7f83add06a28 "trusted.ec.heal", xdata_in=xdata_in@entry=0x0, xdata_out=xdata_out@entry=0x0) at syncop.c:1680
#2  0x00007f83add02f27 in ec_shd_selfheal (healer=0x7f83a8030b20, child=<optimized out>, loc=0x7f8367ffebb0, full=<optimized out>) at ec-heald.c:161
#3  0x00007f83add0325b in ec_shd_full_heal (subvol=0x7f83a801f270, entry=<optimized out>, parent=0x7f8367ffede0, data="" at ec-heald.c:294
#4  0x00007f83bc930ac2 in syncop_ftw (subvol=0x7f83a801f270, loc=loc@entry=0x7f8367ffede0, pid=pid@entry=-6, data="" fn=fn@entry=0x7f83add03140 <ec_shd_full_heal>) at syncop-utils.c:125
#5  0x00007f83add03534 in ec_shd_full_sweep (healer=healer@entry=0x7f83a8030b20, inode=<optimized out>) at ec-heald.c:311
#6  0x00007f83add0367b in ec_shd_full_healer (data="" at ec-heald.c:372
#7  0x00007f83bb709e25 in start_thread () from /usr/lib64/libpthread.so.0
#8  0x00007f83bafd634d in clone () from /usr/lib64/libc.so.6
Thread 1 (Thread 0x7f83bcdd1780 (LWP 25383)):
#0  0x00007f83bb70af57 in pthread_join () from /usr/lib64/libpthread.so.0
#1  0x00007f83bc92eff8 in event_dispatch_epoll (event_pool=0x55af0a6dd560) at event-epoll.c:846
#2  0x000055af0a4116b8 in main (argc=15, argv=0x7fff75610898) at glusterfsd.c:2848


>>>>>>>>>>>>>>>>>>>>>>>>>>.

Thanks,
Mohit Agrawal

On Fri 11 Jan, 2019, 21:20 Shyam Ranganathan <srangana@xxxxxxxxxx wrote:
We can check health on master post the patch as stated by Mohit below.

Release-5 is causing some concerns as we need to tag the release
yesterday, but we have the following 2 tests failing or coredumping
pretty regularly, need attention on these.

ec/bug-1236065.t
glusterd/add-brick-and-validate-replicated-volume-options.t

Shyam
On 1/10/19 6:20 AM, Mohit Agrawal wrote:
> I think we should consider regression-builds after merged the patch
> (https://review.gluster.org/#/c/glusterfs/+/21990/
> as we know this patch introduced some delay.
>
> Thanks,
> Mohit Agrawal
>
> On Thu, Jan 10, 2019 at 3:55 PM Atin Mukherjee <amukherj@xxxxxxxxxx
> <mailto:amukherj@xxxxxxxxxx>> wrote:
>
>     Mohit, Sanju - request you to investigate the failures related to
>     glusterd and brick-mux and report back to the list.
>
>     On Thu, Jan 10, 2019 at 12:25 AM Shyam Ranganathan
>     <srangana@xxxxxxxxxx <mailto:srangana@xxxxxxxxxx>> wrote:
>
>         Hi,
>
>         As part of branching preparation next week for release-6, please
>         find
>         test failures and respective test links here [1].
>
>         The top tests that are failing/dumping-core are as below and
>         need attention,
>         - ec/bug-1236065.t
>         - glusterd/add-brick-and-validate-replicated-volume-options.t
>         - readdir-ahead/bug-1390050.t
>         - glusterd/brick-mux-validation.t
>         - bug-1432542-mpx-restart-crash.t
>
>         Others of interest,
>         - replicate/bug-1341650.t
>
>         Please file a bug if needed against the test case and report the
>         same
>         here, in case a problem is already addressed, then do send back the
>         patch details that addresses this issue as a response to this mail.
>
>         Thanks,
>         Shyam
>
>         [1] Regression failures:
>         https://hackmd.io/wsPgKjfJRWCP8ixHnYGqcA?view
>         _______________________________________________
>         Gluster-devel mailing list
>         Gluster-devel@xxxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxxx>
>         https://lists.gluster.org/mailman/listinfo/gluster-devel
>
>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux