This wasn't my issue, but I'm still having the issue. Today I purged glusterfs 3.1.1 and installed 3.1.2 fresh from deb. I recreated my volume, started it, everything was going fine, mounted the share, then ran df -h to see it, now every few seconds my logs posts this: ==> /var/log/glusterfs/nfs.log <== [2011-02-03 15:55:57.145626] E [client-handshake.c:1079:client_query_portmap_cbk] bhl-volume-client-98: failed to get the port number for remote subvolume [2011-02-03 15:55:57.145694] I [client.c:1590:client_rpc_notify] bhl-volume-client-98: disconnected ==> /var/log/glusterfs/mnt-glusterfs.log <== [2011-02-03 15:55:57.605802] E [common-utils.c:124:gf_resolve_ip6] resolver: getaddrinfo failed (Name or service not known) [2011-02-03 15:55:57.605834] E [name.c:251:af_inet_client_get_remote_sockaddr] glusterfs: DNS resolution failed on host /etc/glusterfs/glusterfs.vol over and over. Any clues as to how I can fix this? This one issue has made our entire 100TB store unusable. and again, gluster volume info shows all the bricks are OK, including 98: gluster> volume info Volume Name: bhl-volume Type: Distributed-Replicate Status: Started Number of Bricks: 72 x 2 = 144 Transport-type: tcp Bricks: [...] Brick92: clustr-02:/mnt/data16 Brick93: clustr-03:/mnt/data16 Brick94: clustr-04:/mnt/data16 Brick95: clustr-05:/mnt/data16 Brick96: clustr-06:/mnt/data16 Brick97: clustr-01:/mnt/data17 Brick98: clustr-02:/mnt/data17 Brick99: clustr-03:/mnt/data17 Brick100: clustr-04:/mnt/data17 Brick101: clustr-05:/mnt/data17 Brick102: clustr-06:/mnt/data17 Brick103: clustr-01:/mnt/data18 Brick104: clustr-02:/mnt/data18 Brick105: clustr-03:/mnt/data18 [...] P On Mon, Jan 31, 2011 at 4:26 PM, Anand Avati <anand.avati at gmail.com> wrote: > Can you post your server logs? What happens if you run 'df -k' on your > backend export filesystems? > > Thanks > Avati > > On Mon, Jan 17, 2011 at 5:27 AM, Joe Warren-Meeks > <joe at encoretickets.co.uk>wrote: > >> >> (sorry about topposting.) >> >> Just changing the timeout would only mask the problem. The real issue is >> that running 'df' on either node causes a hang. >> >> All other operations seem fine, files can be created and deleted as >> normal with the results showing up on both. >> >> I'd like to work out why it's hanging on df so I can fix it and get my >> monitoring and cron scripts running again :) >> >> ?-- joe. >> >> -----Original Message----- >> From: gluster-users-bounces at gluster.org >> [mailto:gluster-users-bounces at gluster.org] On Behalf Of Daniel Maher >> Sent: 17 January 2011 12:48 >> To: gluster-users at gluster.org >> Subject: Re: df causes hang >> >> On 01/17/2011 10:47 AM, Joe Warren-Meeks wrote: >> > Hey chaps, >> > >> > Anyone got any pointers as to what this might be? This is still >> causing >> > a lot of problems for us whenever we attempt to do df. >> > >> > ? -- joe. >> > >> > -----Original Message----- >> >> > However, for some reason, they've got into a bit of a state such that >> > typing 'df -k' causes both to hang, resulting in a loss of service >> for42 >> > seconds. I see the following messages in the log files: >> > >> > >> >> 42 seconds is the default tcp timeout time for any given node - you >> could try tuning that down and seeing how it works for you. >> >> http://www.gluster.com/community/documentation/index.php/Gluster_3.1:_Se >> tting_Volume_Options >> >> >> -- >> Daniel Maher <dma+gluster AT witbe DOT net> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > > -- http://philcryer.com