Re: NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Emmanuel,
I am not sure of the feasibility but just wanted to ask you. Do you think there is a possibility to error out operations on the mount when mount crashes instead of hanging? That would prevent a lot of manual intervention even in future.

Pranith.
On 06/15/2015 01:35 PM, Niels de Vos wrote:
Hi,

sometimes the NetBSD regression tests hang with messages like this:

     [12:29:07] ./tests/basic/mgmt_v3-locks.t
     ........................................... ok    79867 ms
     No volumes present
     mount_nfs: can't access /patchy: Permission denied
     mount_nfs: can't access /patchy: Permission denied
     mount_nfs: can't access /patchy: Permission denied

Most (if not all) of these hangs are caused by a crashing Gluster/NFS
process. Once the Gluster/NFS server is not reachable anymore,
unmounting fails.

The only way to recover is to reboot the VM and retrigger the test. For
rebooting, the http://build.gluster.org/job/reboot-vm job can be used,
and retriggering works by clicking the "retrigger" link in the left menu
once the test has been marked as failed/aborted.

When logging in on the NetBSD system that hangs, you can verify with
these steps:

1. check if there is a /glusterfsd.core file
2. run gdb on the core:

     # cd /build/install
     # gdb --core=/glusterfsd.core sbin/glusterfs
     ...
     Program terminated with signal SIGSEGV, Segmentation fault.
     #0  0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8,
     host_addr=0xb900e400 "104.130.205.187", timestamp=0xbf7fd900,
     can_write=0xbf7fd8fc)
         at
     /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164
     164             *can_write = lookup_res->item->opts->rw;

3. verify the lookup_res structure:

     (gdb) p *lookup_res
     $1 = {timestamp = 1434284981, item = 0xb901e3b0}
     (gdb) p *lookup_res->item
     $2 = {name = 0xffffff00 <error: Cannot access memory at address
     0xffffff00>, opts = 0xffffffff}


A fix for this has been sent, it is currently waiting for an update to
the prosed reference counting:

   - http://review.gluster.org/11022
     core: add "gf_ref_t" for common refcounting structures
   - http://review.gluster.org/11023
     nfs: refcount each auth_cache_entry and related data_t

Thanks,
Niels
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel



[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux