Re: [Gluster-Maintainers] NetBSD aborted runs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am seeing a pause when the .t runs that seem to last close to how much ever time we put in EXPECT_WITHIN

[2016-09-01 03:24:21.852744] I [common.c:1134:pl_does_monkey_want_stuck_lock] 0-patchy-locks: stuck lock
[2016-09-01 03:24:21.852775] W [inodelk.c:659:pl_inode_setlk] 0-patchy-locks: MONKEY LOCKING (forcing stuck lock)! at 2016-09-01 03:24:21
[2016-09-01 03:24:21.852792] I [server-rpc-fops.c:317:server_finodelk_cbk] 0-patchy-server: replied
[2016-09-01 03:24:21.861937] I [server-rpc-fops.c:5682:server3_3_inodelk] 0-patchy-server: inbound
[2016-09-01 03:24:21.862318] I [server-rpc-fops.c:278:server_inodelk_cbk] 0-patchy-server: replied
[2016-09-01 03:24:21.862627] I [server-rpc-fops.c:5682:server3_3_inodelk] 0-patchy-server: inbound <<---- No I/O after this.
[2016-09-01 03:27:19.6N]:++++++++++ G_LOG:tests/features/lock_revocation.t: TEST: 52 append_to_file /mnt/glusterfs/1/testfile ++++++++++
[2016-09-01 03:27:19.871044] I [server-rpc-fops.c:5772:server3_3_finodelk] 0-patchy-server: inbound
[2016-09-01 03:27:19.871280] I [clear.c:219:clrlk_clear_inodelk] 0-patchy-locks: 2
[2016-09-01 03:27:19.871307] I [clear.c:273:clrlk_clear_inodelk] 0-patchy-locks: released_granted
[2016-09-01 03:27:19.871330] I [server-rpc-fops.c:278:server_inodelk_cbk] 0-patchy-server: replied
[2016-09-01 03:27:19.871389] W [inodelk.c:228:__inodelk_prune_stale] 0-patchy-locks: Lock revocation [reason: age; gfid: 3ccca736-ba89-4f8c-ba17-f6cdbcd0e3c3; domain: patchy-replicate-0; age: 178 sec] - Inode lock revoked:  0 granted & 1 blocked locks cleared

We can prevent the hang with adding $CLI volume stop $V0, but the test would fail. When that happens, the following error is printed on the console from perfused

perfused: perfuse_node_inactive: perfuse_node_fsync failed error = 57: Resource temporarily unavailable <<--- I wonder if this comes because INODELK fop fails with EAGAIN.

I am also seeing a weird behaviour where  it says it is releasing granted locks but prints that it released 1 blocked lock.

+Manu
I think there are 2 things going on here. 1) There is a hang, I am still guessing it is gluster issue until proven otherwise.
2) I got to figure out why the counters are showing wrong information from the information printed in the logs. I kept going through the code, it seems fine. It should have printed that it released 1 granted lock & 0 blocked locks. But it prints it in reverse.

If you do git diff on nbslave72.cloud.gluster.org, you can see the changes I made. Could you please help?


On Sun, Aug 28, 2016 at 7:36 AM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
This is still bothering us a lot and looks like there is a genuine issue in the code which is making the the process to be hung/deadlocked?

Raghavendra T - any more findings?


On Friday 19 August 2016, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:
NetBSD regressions are getting aborted very frequently. Apart from the infra issue related to connectivity (Nigel has started looking into it), lock_revocation.t is getting hung in such instances which is causing run to be aborted after 300 minutes. This has already started impacting the patches to get in which eventually impacts the upcoming release cycles.

I'd request the feature owner/maintainer to have a look at it asap.

--Atin


--
--Atin

_______________________________________________
maintainers mailing list
maintainers@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/maintainers




--
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux