While analysing the logs of the runs where uss.t failed made following observations.
1) In the first iteration of uss.t, the time difference between the first test of the .t file and the last test of the .t file is just within 1 minute.
But, I think it is the cleanup sequence which is taking more time. One of the reasons I guess this is happening is, we dont see the brick process shutting down message
in the logs.
2) In the 2nd iteration of uss.t (because 1st iteration failed because of timeout) it fails because something has not been completed in the cleanup sequence of the previous iteration.
The volume start command itself fails in the 2nd iteration. Because of that the remaining tests also fail
This is from cmd_history.log
uster.org:/d/backends/2/patchy_snap_mnt builder202.int.aws.gluster.org:/d/backends/3/patchy_snap_mnt ++++++++++
[2019-04-10 19:54:09.145086] : volume create patchy builder202.int.aws.gluster.org:/d/backends/1/patchy_snap_mnt builder202.int.aws.gluster.org:/d/backends/2/patchy_snap_mnt builder202.int.aws.gluster.org:/d/backends/3/patchy_snap_mnt : SUCCESS
[2019-04-10 19:54:09.156221]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 39 gluster --mode=script --wignore volume set patchy nfs.disable false ++++++++++
[2019-04-10 19:54:09.265138] : volume set patchy nfs.disable false : SUCCESS
[2019-04-10 19:54:09.274386]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 42 gluster --mode=script --wignore volume start patchy ++++++++++
[2019-04-10 19:54:09.565086] : volume start patchy : FAILED : Commit failed on localhost. Please check log file for details.
[2019-04-10 19:54:09.572753]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 44 _GFS --attribute-timeout=0 --entry-timeout=0 --volfile-server=builder202.int.aws.gluster.org --volfile-id=patchy /mnt/glusterfs/0 ++++++++++
And this is from the brick showing some issue with the export directory not being present properly.
[2019-04-10 19:54:09.544476] I [MSGID: 100030] [glusterfsd.c:2857:main] 0-/build/install/sbin/glusterfsd: Started running /build/install/sbin/glusterfsd version 7dev (args: /build/install/sbin/glusterfsd -s buil
der202.int.aws.gluster.org --volfile-id patchy.builder202.int.aws.gluster.org.d-backends-1-patchy_snap_mnt -p /var/run/gluster/vols/patchy/builder202.int.aws.gluster.org-d-backends-1-patchy_snap_mnt.pid -S /var/
run/gluster/7ac65190b72da80a.socket --brick-name /d/backends/1/patchy_snap_mnt -l /var/log/glusterfs/bricks/d-backends-1-patchy_snap_mnt.log --xlator-option *-posix.glusterd-uuid=695c060d-74d3-440e-8cdb-327ec297
f2d2 --process-name brick --brick-port 49152 --xlator-option patchy-server.listen-port=49152)
[2019-04-10 19:54:09.549394] I [socket.c:962:__socket_server_bind] 0-socket.glusterfsd: closing (AF_UNIX) reuse check socket 9
[2019-04-10 19:54:09.553190] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1
[2019-04-10 19:54:09.553209] I [MSGID: 101190] [event-epoll.c:680:event_dispatch_epoll_worker] 0-epoll: Started thread with index 0
[2019-04-10 19:54:09.556932] I [rpcsvc.c:2694:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 64
[2019-04-10 19:54:09.557859] E [MSGID: 138001] [index.c:2392:init] 0-patchy-index: Failed to find parent dir (/d/backends/1/patchy_snap_mnt/.glusterfs) of index basepath /d/backends/1/patchy_snap_mnt/.glusterfs/
indices. [No such file or directory] ============================> (.glusterfs is absent)
[2019-04-10 19:54:09.557884] E [MSGID: 101019] [xlator.c:629:xlator_init] 0-patchy-index: Initialization of volume 'patchy-index' failed, review your volfile again
[2019-04-10 19:54:09.557892] E [MSGID: 101066] [graph.c:409:glusterfs_graph_init] 0-patchy-index: initializing translator failed
[2019-04-10 19:54:09.557900] E [MSGID: 101176] [graph.c:772:glusterfs_graph_activate] 0-graph: init failed
[2019-04-10 19:54:09.564154] I [io-stats.c:4033:fini] 0-patchy-io-stats: io-stats translator unloaded
[2019-04-10 19:54:09.564748] W [glusterfsd.c:1592:cleanup_and_exit] (-->/build/install/sbin/glusterfsd(mgmt_getspec_cbk+0x806) [0x411f32] -->/build/install/sbin/glusterfsd(glusterfs_process_volfp+0x272) [0x40b9b
9] -->/build/install/sbin/glusterfsd(cleanup_and_exit+0x88) [0x4093a5] ) 0-: received signum (-1), shutting down
And this is from the cmd_history.log file of the 2nd iteration uss.t from another jenkins run of uss.t
[2019-04-10 15:35:51.927343]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 39 gluster --mode=script --wignore volume set patchy nfs.disable false ++++++++++
[2019-04-10 15:35:52.038072] : volume set patchy nfs.disable false : SUCCESS
[2019-04-10 15:35:52.057582]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 42 gluster --mode=script --wignore volume start patchy ++++++++++
[2019-04-10 15:35:52.104288] : volume start patchy : FAILED : Failed to find brick directory /d/backends/1/patchy_snap_mnt for volume patchy. Reason : No such file or directory =========> (export directory is not present)
[2019-04-10 15:35:52.117735]:++++++++++ G_LOG:./tests/basic/uss.t: TEST: 44 _GFS --attribute-timeout=0 --entry-timeout=0 --volfile-server=builder205.int.aws.gluster.org --volfile-id=patchy /mnt/glusterfs/0 ++++++++++
I suspect something wrong with the cleanup sequence which causes the timeout of the test in the 1st iteration and the export directory issues in the next iteration causes the failure of uss.t in the 2nd iteration.
Regards,
Raghavendra
On Wed, Apr 10, 2019 at 4:07 PM FNU Raghavendra Manjunath <rabhat@xxxxxxxxxx> wrote:
On Wed, Apr 10, 2019 at 9:59 AM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:And now for last 15 days:./tests/bitrot/bug-1373520.t 18 ==> Fixed through https://review.gluster.org/#/c/glusterfs/+/22481/, I don't see this failing in brick mux post 5th AprilThe above patch has been sent to fix the failure with brick mux enabled../tests/bugs/ec/bug-1236065.t 17 ==> happens only in brick mux, needs analysis.
./tests/basic/uss.t 15 ==> happens in both brick mux and non brick mux runs, test just simply times out. Needs urgent analysis.Nothing has changed in snapview-server and snapview-client recently. Looking into it._______________________________________________./tests/basic/ec/ec-fix-openfd.t 13 ==> Fixed through https://review.gluster.org/#/c/22508/ , patch merged today.
./tests/basic/volfile-sanity.t 8 ==> Some race, though this succeeds in second attempt every time.
There're plenty more with 5 instances of failure from many tests. We need all maintainers/owners to look through these failures and fix them, we certainly don't want to get into a stage where master is unstable and we have to lock down the merges till all these failures are resolved. So please help.(Please note fstat stats show up the retries as failures too which in a way is right)On Tue, Feb 26, 2019 at 5:27 PM Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:[1] captures the test failures report since last 30 days and we'd need volunteers/component owners to see why the number of failures are so high against few tests.
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-devel