3.1.2 Debian - client_rpc_notify "failed to get the port number for remote subvolume"

phil at cryer.us (phil cryer) · Fri, 4 Feb 2011 14:58:05 -0600

>>> However, if I do a gluster volume info I see that it's listed:
>>> # gluster volume info | grep 98
>>> Brick98: clustr-02:/mnt/data17

But now I'm thinking this is wrong because while it says clustr-02,
the error stops occurring when I stop clustr-03. So how do I really
know, not only what host it's on, but what brick each mount is on?
(/mnt/data* in my case)

In other words, does
bhl-volume-client-98 != Brick98: clustr-02:/mnt/data17 ?

and if not, how can I tell which brick is bhl-volume-client-98?

P

On Fri, Feb 4, 2011 at 1:49 PM, phil cryer <phil at cryer.us> wrote:
> On Fri, Feb 4, 2011 at 12:33 PM, Anand Avati <anand.avati at gmail.com> wrote:
>> It is very likely the brick process is failing to start. Please look at the
>> brick log on that server. (in /var/log/glusterfs/bricks/* )
>> Avati
>
> Thanks, so if I'm looking at it right, the 'bhl-volume-client-98' is
> really Brick98: clustr-02:/mnt/data17 - I'm figuring that from this:
>
>>> [2011-02-04 13:09:28.407300] I [client.c:1590:client_rpc_notify]
>>> bhl-volume-client-98: disconnected
>>>
>>> However, if I do a gluster volume info I see that it's listed:
>>> # gluster volume info | grep 98
>>> Brick98: clustr-02:/mnt/data17
>
> But on that server I don't see any issues with that brick starting:
>
> # head mnt-data17.log -n50
> [2011-02-03 23:29:24.235648] W [graph.c:274:gf_add_cmdline_options]
> bhl-volume-server: adding option 'listen-port' for volume
> 'bhl-volume-server' with value '24025'
> [2011-02-03 23:29:24.236017] W
> [rpc-transport.c:566:validate_volume_options] tcp.bhl-volume-server:
> option 'listen-port' is deprecated, preferred is
> 'transport.socket.listen-port', continuing with correction
> Given volfile:
> +------------------------------------------------------------------------------+
> ?1: volume bhl-volume-posix
> ?2: ? ? type storage/posix
> ?3: ? ? option directory /mnt/data17
> ?4: end-volume
> ?5:
> ?6: volume bhl-volume-access-control
> ?7: ? ? type features/access-control
> ?8: ? ? subvolumes bhl-volume-posix
> ?9: end-volume
> ?10:
> ?11: volume bhl-volume-locks
> ?12: ? ? type features/locks
> ?13: ? ? subvolumes bhl-volume-access-control
> ?14: end-volume
> ?15:
> ?16: volume bhl-volume-io-threads
> ?17: ? ? type performance/io-threads
> ?18: ? ? subvolumes bhl-volume-locks
> ?19: end-volume
> ?20:
> ?21: volume /mnt/data17
> ?22: ? ? type debug/io-stats
> ?23: ? ? subvolumes bhl-volume-io-threads
> ?24: end-volume
> ?25:
> ?26: volume bhl-volume-server
> ?27: ? ? type protocol/server
> ?28: ? ? option transport-type tcp
> ?29: ? ? option auth.addr./mnt/data17.allow *
> ?30: ? ? subvolumes /mnt/data17
> ?31: end-volume
>
> +------------------------------------------------------------------------------+
> [2011-02-03 23:29:28.575630] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 128.128.164.219:724
> [2011-02-03 23:29:28.583169] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 127.0.1.1:985
> [2011-02-03 23:29:28.603357] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 128.128.164.218:726
> [2011-02-03 23:29:28.605650] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 128.128.164.217:725
> [2011-02-03 23:29:28.608033] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 128.128.164.215:725
> [2011-02-03 23:29:31.161985] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 128.128.164.74:697
> [2011-02-04 00:40:11.600314] I
> [server-handshake.c:535:server_setvolume] bhl-volume-server: accepted
> client from 128.128.164.74:805
>
> Plus, looking at the tail of this log, it's still working, latest
> messages (from 4 seconds before) as I'm moving some things on the
> cluster
>
> [2011-02-04 23:13:35.53685] W [server-resolve.c:565:server_resolve]
> bhl-volume-server: pure path resolution for
> /www/d/dasobstdertropen00schrrich (INODELK)
> [2011-02-04 23:13:35.57107] W [server-resolve.c:565:server_resolve]
> bhl-volume-server: pure path resolution for
> /www/d/dasobstdertropen00schrrich (SETXATTR)
> [2011-02-04 23:13:35.59699] W [server-resolve.c:565:server_resolve]
> bhl-volume-server: pure path resolution for
> /www/d/dasobstdertropen00schrrich (INODELK)
>
> Thanks!
>
> P
>
>
>
>>
>> On Fri, Feb 4, 2011 at 10:19 AM, phil cryer <phil at cryer.us> wrote:
>>>
>>> I have glusterfs 3.1.2 running on Debian, I'm able to start the volume
>>> and now mount it via mount -t gluster and I can see everything. I am
>>> still seeing the following error in /var/log/glusterfs/nfs.log
>>>
>>> [2011-02-04 13:09:16.404851] E
>>> [client-handshake.c:1079:client_query_portmap_cbk]
>>> bhl-volume-client-98: failed to get the port number for remote
>>> subvolume
>>> [2011-02-04 13:09:16.404909] I [client.c:1590:client_rpc_notify]
>>> bhl-volume-client-98: disconnected
>>> [2011-02-04 13:09:20.405843] E
>>> [client-handshake.c:1079:client_query_portmap_cbk]
>>> bhl-volume-client-98: failed to get the port number for remote
>>> subvolume
>>> [2011-02-04 13:09:20.405938] I [client.c:1590:client_rpc_notify]
>>> bhl-volume-client-98: disconnected
>>> [2011-02-04 13:09:24.406634] E
>>> [client-handshake.c:1079:client_query_portmap_cbk]
>>> bhl-volume-client-98: failed to get the port number for remote
>>> subvolume
>>> [2011-02-04 13:09:24.406711] I [client.c:1590:client_rpc_notify]
>>> bhl-volume-client-98: disconnected
>>> [2011-02-04 13:09:28.407249] E
>>> [client-handshake.c:1079:client_query_portmap_cbk]
>>> bhl-volume-client-98: failed to get the port number for remote
>>> subvolume
>>> [2011-02-04 13:09:28.407300] I [client.c:1590:client_rpc_notify]
>>> bhl-volume-client-98: disconnected
>>>
>>> However, if I do a gluster volume info I see that it's listed:
>>> # gluster volume info | grep 98
>>> Brick98: clustr-02:/mnt/data17
>>>
>>> I've gone to that host, unmounted the specific drive, ran fsck.ext4 on
>>> it, and it came back clean. Remounting and then restarting gluster on
>>> all the nodes hasn't changed anything, I keep getting that error.
>>> Also, I don't understand why it can't get the port number since it's
>>> working fine on 23 other bricks (drives) on that server; leads me to
>>> believe that it's not an accurate error.
>>>
>>> I searched the mailing lists and bug-tracker, and only found this similar
>>> bug:
>>> http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1640
>>>
>>> Any idea what's going on? Is this just a benign error since the
>>> cluster still seems to be working, or ?
>>>
>>> Thanks
>>>
>>> P
>>> --
>>> http://philcryer.com
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>>
>>
>
>
>
> --
> http://philcryer.com
>

-- 
http://philcryer.com