df causes hang

anand.avati at gmail.com (Anand Avati) · Thu, 3 Feb 2011 21:02:37 -0800

Ah! you must be mounting it wrong.. please mount it from a server (not using
volfile)

mount -t glusterfs SERVER:/vol /mnt

or

glusterfs -s SERVER --volfile-id vol /mnt

that should fix it

Avati

On Thu, Feb 3, 2011 at 7:07 PM, phil cryer <phil at cryer.us> wrote:

> Avati - thanks for your reply, my comments below
>
> >> [name.c:251:af_inet_client_get_remote_sockaddr] glusterfs: DNS
> >> resolution failed on host /etc/glusterfs/glusterfs.vol
>
> > Please make sure you are able to resolve hostnames as given in volume
> info
> > in all of your servers via 'dig'. The logs clearly show that host
> resolution
> > seems to be failing.
>
> Agreed, however that does seem to be the issue because I can dig the
> host (they're all defined in my hosts file too so it doesn't have to
> look them up) named clustr-02 and in fact there are 23 other 'bricks'
> on that host that are working fine:
>
> # gluster volume info | grep clustr-02
> Brick2: clustr-02:/mnt/data01
> Brick8: clustr-02:/mnt/data02
> Brick14: clustr-02:/mnt/data03
> Brick20: clustr-02:/mnt/data04
> Brick26: clustr-02:/mnt/data05
> Brick32: clustr-02:/mnt/data06
> Brick38: clustr-02:/mnt/data07
> Brick44: clustr-02:/mnt/data08
> Brick50: clustr-02:/mnt/data09
> Brick56: clustr-02:/mnt/data10
> Brick62: clustr-02:/mnt/data11
> Brick68: clustr-02:/mnt/data12
> Brick74: clustr-02:/mnt/data13
> Brick80: clustr-02:/mnt/data14
> Brick86: clustr-02:/mnt/data15
> Brick92: clustr-02:/mnt/data16
> Brick98: clustr-02:/mnt/data17
> Brick104: clustr-02:/mnt/data18
> Brick110: clustr-02:/mnt/data19
> Brick116: clustr-02:/mnt/data20
> Brick122: clustr-02:/mnt/data21
> Brick128: clustr-02:/mnt/data22
> Brick134: clustr-02:/mnt/data23
> Brick140: clustr-02:/mnt/data24
>
> I logged into that host, unmounted that mount, ran fsck.ext4 on it,
> but it came back clean.
>
> Also thing, the log says: "glusterfs: DNS >> resolution failed on host
> /etc/glusterfs/glusterfs.vol" - however, there is obviously no host
> named  /etc/glusterfs/glusterfs.vol - does this point to an issue?
>
> And lastly, I even have a file named /etc/glusterfs/glusterfs.vol"
>
> ls -ls /etc/glusterfs
> -rw-r--r-- 1 root root  229 Jan 16 21:15 glusterd.vol
> -rw-r--r-- 1 root root 1908 Jan 16 21:15 glusterfsd.vol.sample
> -rw-r--r-- 1 root root 2005 Jan 16 21:15 glusterfs.vol.sample
>
> I created all of the configs via the gluster> commandline tool.
>
> Thanks
>
> P
>
>
>
>
> On Thu, Feb 3, 2011 at 6:39 PM, Anand Avati <anand.avati at gmail.com> wrote:
> > Please make sure you are able to resolve hostnames as given in volume
> info
> > in all of your servers via 'dig'. The logs clearly show that host
> resolution
> > seems to be failing.
> > Avati
> >
> > On Thu, Feb 3, 2011 at 1:08 PM, phil cryer <phil at cryer.us> wrote:
> >>
> >> This wasn't my issue, but I'm still having the issue. Today I purged
> >> glusterfs 3.1.1 and installed 3.1.2 fresh from deb. I recreated my
> >> volume, started it, everything was going fine, mounted the share, then
> >> ran df -h to see it, now every few seconds my logs posts this:
> >>
> >> ==> /var/log/glusterfs/nfs.log <==
> >> [2011-02-03 15:55:57.145626] E
> >> [client-handshake.c:1079:client_query_portmap_cbk]
> >> bhl-volume-client-98: failed to get the port number for remote
> >> subvolume
> >> [2011-02-03 15:55:57.145694] I [client.c:1590:client_rpc_notify]
> >> bhl-volume-client-98: disconnected
> >>
> >> ==> /var/log/glusterfs/mnt-glusterfs.log <==
> >> [2011-02-03 15:55:57.605802] E [common-utils.c:124:gf_resolve_ip6]
> >> resolver: getaddrinfo failed (Name or service not known)
> >> [2011-02-03 15:55:57.605834] E
> >> [name.c:251:af_inet_client_get_remote_sockaddr] glusterfs: DNS
> >> resolution failed on host /etc/glusterfs/glusterfs.vol
> >>
> >> over and over. Any clues as to how I can fix this? This one issue has
> >> made our entire 100TB store unusable.
> >>
> >> and again, gluster volume info shows all the bricks are OK, including
> 98:
> >>
> >> gluster> volume info
> >>
> >> Volume Name: bhl-volume
> >> Type: Distributed-Replicate
> >> Status: Started
> >> Number of Bricks: 72 x 2 = 144
> >> Transport-type: tcp
> >> Bricks:
> >> [...]
> >> Brick92: clustr-02:/mnt/data16
> >> Brick93: clustr-03:/mnt/data16
> >> Brick94: clustr-04:/mnt/data16
> >> Brick95: clustr-05:/mnt/data16
> >> Brick96: clustr-06:/mnt/data16
> >> Brick97: clustr-01:/mnt/data17
> >> Brick98: clustr-02:/mnt/data17
> >> Brick99: clustr-03:/mnt/data17
> >> Brick100: clustr-04:/mnt/data17
> >> Brick101: clustr-05:/mnt/data17
> >> Brick102: clustr-06:/mnt/data17
> >> Brick103: clustr-01:/mnt/data18
> >> Brick104: clustr-02:/mnt/data18
> >> Brick105: clustr-03:/mnt/data18
> >> [...]
> >>
> >>
> >> P
> >>
> >>
> >> On Mon, Jan 31, 2011 at 4:26 PM, Anand Avati <anand.avati at gmail.com>
> >> wrote:
> >> > Can you post your server logs? What happens if you run 'df -k' on your
> >> > backend export filesystems?
> >> >
> >> > Thanks
> >> > Avati
> >> >
> >> > On Mon, Jan 17, 2011 at 5:27 AM, Joe Warren-Meeks
> >> > <joe at encoretickets.co.uk>wrote:
> >> >
> >> >>
> >> >> (sorry about topposting.)
> >> >>
> >> >> Just changing the timeout would only mask the problem. The real issue
> >> >> is
> >> >> that running 'df' on either node causes a hang.
> >> >>
> >> >> All other operations seem fine, files can be created and deleted as
> >> >> normal with the results showing up on both.
> >> >>
> >> >> I'd like to work out why it's hanging on df so I can fix it and get
> my
> >> >> monitoring and cron scripts running again :)
> >> >>
> >> >>  -- joe.
> >> >>
> >> >> -----Original Message-----
> >> >> From: gluster-users-bounces at gluster.org
> >> >> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Daniel Maher
> >> >> Sent: 17 January 2011 12:48
> >> >> To: gluster-users at gluster.org
> >> >> Subject: Re: df causes hang
> >> >>
> >> >> On 01/17/2011 10:47 AM, Joe Warren-Meeks wrote:
> >> >> > Hey chaps,
> >> >> >
> >> >> > Anyone got any pointers as to what this might be? This is still
> >> >> causing
> >> >> > a lot of problems for us whenever we attempt to do df.
> >> >> >
> >> >> >   -- joe.
> >> >> >
> >> >> > -----Original Message-----
> >> >>
> >> >> > However, for some reason, they've got into a bit of a state such
> that
> >> >> > typing 'df -k' causes both to hang, resulting in a loss of service
> >> >> for42
> >> >> > seconds. I see the following messages in the log files:
> >> >> >
> >> >> >
> >> >>
> >> >> 42 seconds is the default tcp timeout time for any given node - you
> >> >> could try tuning that down and seeing how it works for you.
> >> >>
> >> >>
> >> >>
> http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Se
> >> >> tting_Volume_Options
> >> >>
> >> >>
> >> >> --
> >> >> Daniel Maher <dma+gluster AT witbe DOT net>
> >> >> _______________________________________________
> >> >> Gluster-users mailing list
> >> >> Gluster-users at gluster.org
> >> >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Gluster-users mailing list
> >> >> Gluster-users at gluster.org
> >> >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >> >>
> >> >
> >> > _______________________________________________
> >> > Gluster-users mailing list
> >> > Gluster-users at gluster.org
> >> > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> http://philcryer.com
> >
> >
>
>
>
> --
> http://philcryer.com
>