The hang we observe is not something specific to Gluster. I've observed this kind of hangs when a filesystem which is in use goes offline. For example I've accidently shutdown machines which were being used for mounting nfs, which lead to the client systems hanging completely and required a hard reboot. If there are ways to avoid these kinds hangs when they eventually occur, I'm all ears. On Mon, Jun 15, 2015 at 4:38 PM, Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> wrote: > Emmanuel, > I am not sure of the feasibility but just wanted to ask you. Do you > think there is a possibility to error out operations on the mount when mount > crashes instead of hanging? That would prevent a lot of manual intervention > even in future. > > Pranith. > > On 06/15/2015 01:35 PM, Niels de Vos wrote: >> >> Hi, >> >> sometimes the NetBSD regression tests hang with messages like this: >> >> [12:29:07] ./tests/basic/mgmt_v3-locks.t >> ........................................... ok 79867 ms >> No volumes present >> mount_nfs: can't access /patchy: Permission denied >> mount_nfs: can't access /patchy: Permission denied >> mount_nfs: can't access /patchy: Permission denied >> >> Most (if not all) of these hangs are caused by a crashing Gluster/NFS >> process. Once the Gluster/NFS server is not reachable anymore, >> unmounting fails. >> >> The only way to recover is to reboot the VM and retrigger the test. For >> rebooting, the http://build.gluster.org/job/reboot-vm job can be used, >> and retriggering works by clicking the "retrigger" link in the left menu >> once the test has been marked as failed/aborted. >> >> When logging in on the NetBSD system that hangs, you can verify with >> these steps: >> >> 1. check if there is a /glusterfsd.core file >> 2. run gdb on the core: >> >> # cd /build/install >> # gdb --core=/glusterfsd.core sbin/glusterfs >> ... >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8, >> host_addr=0xb900e400 "104.130.205.187", timestamp=0xbf7fd900, >> can_write=0xbf7fd8fc) >> at >> >> /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164 >> 164 *can_write = lookup_res->item->opts->rw; >> >> 3. verify the lookup_res structure: >> >> (gdb) p *lookup_res >> $1 = {timestamp = 1434284981, item = 0xb901e3b0} >> (gdb) p *lookup_res->item >> $2 = {name = 0xffffff00 <error: Cannot access memory at address >> 0xffffff00>, opts = 0xffffffff} >> >> >> A fix for this has been sent, it is currently waiting for an update to >> the prosed reference counting: >> >> - http://review.gluster.org/11022 >> core: add "gf_ref_t" for common refcounting structures >> - http://review.gluster.org/11023 >> nfs: refcount each auth_cache_entry and related data_t >> >> Thanks, >> Niels >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@xxxxxxxxxxx >> http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel