Re: [Nfs-ganesha-devel] Questions on ganesha HA and shared storage size

Alessandro De Salvo <Alessandro.DeSalvo@xxxxxxxxxxxxx> · Mon, 15 Jun 2015 21:40:41 +0200

Hi Malahal,
I have downloaded and used the tarball created by git for the stable 2.2.0 branch, so it should have been consistent. Also, I have used the spec file from Epel to build the rpms. I’m going to try your procedure as well now, to see if anything changes.
Thanks,

	Alessandro

> Il giorno 15/giu/2015, alle ore 20:10, Malahal Naineni <malahal@xxxxxxxxxx> ha scritto:
> 
> I am not familiar with tirpc code, but how are you building ganesha
> rpms? Did you do "git submodule update" to get the latest tirpc code
> when you built those rpms? Can somebody familiar with tirpc chime in?
> 
> The one I use is below:
> # git submodule status
> b1a82463c4029315fac085a9d0d6bef766847ed7 src/libntirpc (v1.2.0-2-gb1a8246)
> 
> The way I build ganesha 2.2 rpms is:
> #1. git clone <repo> && git checkout V2.2-stable
> #2. git submodule update --init
> #3. cmake ./src -DDEBUG_SYMS=ON -DUSE_DBUS=ON -DUSE_ADMIN_TOOLS=ON -DUSE_GUI_ADMIN_TOOLS=OFF
> #4. make dist
> #5. rpmbuild --with utils -ta nfs-ganesha-2.2*
> 
> Regards, Malahal.
> PS: there were some efforts to make tirpc as an rpm by itself. Not sure where
> that is.
> 
> Alessandro De Salvo [Alessandro.DeSalvo@xxxxxxxxxxxxx] wrote:
>> OK, thanks, so, any hint on what I could check now?
>> I have tried even without any VFS, so with just the nfs-ganesha rpm installed and with an empty ganesha.conf, but still the same problem. The same configuration with ganesha 2.1.0 was working, on the same server.
>> Any idea? I have sent you the logs but please tell me if you need more.
>> Thanks,
>> 
>> 	Alessandro
>> 
>>> Il giorno 15/giu/2015, alle ore 18:47, Malahal Naineni <malahal@xxxxxxxxxx> ha scritto:
>>> 
>>> We do run ganesha on RHEL7.0 (same as CentOS7.0), and I don't think 7.1
>>> would be much different. We do run GPFS FSAL only (no VFS_FSAL).
>>> 
>>> Regards, Malahal.
>>> 
>>> Alessandro De Salvo [Alessandro.DeSalvo@xxxxxxxxxxxxx] wrote:
>>>> Hi,
>>>> any news on this? Did you have the chance to look into that?
>>>> I'd also be curious to know if anyone tried nfs ganesha on CentOS 7.1
>>>> and if it was really working, as I also tried on a standalone, clean
>>>> machine, and I see the very same behavior, even without gluster.
>>>> Thanks,
>>>> 
>>>> 	Alessandro
>>>> 
>>>> On Fri, 2015-06-12 at 14:34 +0200, Alessandro De Salvo wrote:
>>>>> Hi,
>>>>> looking at the code and having recompiled adding some more debug, I
>>>>> might be wrong, but it seems that in nfs_rpc_dispatcher_thread.c,
>>>>> fuction nfs_rpc_dequeue_req, the threads enter the while (!(wqe->flags &
>>>>> Wqe_LFlag_SyncDone)) and never exit from there.
>>>>> I do not know if it's normal or not as I should read better the code.
>>>>> Cheers,
>>>>> 
>>>>> 	Alessandro
>>>>> 
>>>>> On Fri, 2015-06-12 at 09:35 +0200, Alessandro De Salvo wrote:
>>>>>> Hi Malahal,
>>>>>> 
>>>>>> 
>>>>>>> Il giorno 12/giu/2015, alle ore 01:23, Malahal Naineni <malahal@xxxxxxxxxx> ha scritto:
>>>>>>> 
>>>>>>> The logs indicate that ganesha was started successfully without any
>>>>>>> exports.  gstack output seemed normal as well -- threads were waiting to
>>>>>>> serve requests.
>>>>>> 
>>>>>> Yes, no exports as it was the default config before enabling Ganesha on any gluster volume.
>>>>>> 
>>>>>>> 
>>>>>>> Assuming that you are running "showmount -e" on the same system, there
>>>>>>> shouldn't be any firewall coming into the picture.
>>>>>> 
>>>>>> Yes it was the case in my last attempt, from the same machine. I also tried from another machine, but the result was the same. The firewall (firewalld, as it's a CentOS 7.1) is disabled anyways.
>>>>>> 
>>>>>>> If you are running
>>>>>>> "showmount" from some other system, make sure there is no firewall
>>>>>>> dropping the packets.
>>>>>>> 
>>>>>>> I think you need tcpdump trace to figure out the problem. My wireshark
>>>>>>> trace showed two requests from the client to complete the "showmount -e"
>>>>>>> command:
>>>>>>> 
>>>>>>> 1. Client sent "GETPORT" call to port 111 (rpcbind) to get the port number
>>>>>>> of MOUNT.
>>>>>>> 2. Then it sent "EXPORT" call to mountd port (port it got in response to #1).
>>>>>> 
>>>>>> Yes, I did it already, and indeed it showed the two requests, so the portmapper works fine, but it hangs on the second request.
>>>>>> Also "rpcinfo -t localhost portmapper" returns successfully, while "rpcinfo -t localhost nfs" hangs.
>>>>>> The output of rpcinfo -p is the following:
>>>>>> 
>>>>>>   program vers proto   port  service
>>>>>>   100000    4   tcp    111  portmapper
>>>>>>   100000    3   tcp    111  portmapper
>>>>>>   100000    2   tcp    111  portmapper
>>>>>>   100000    4   udp    111  portmapper
>>>>>>   100000    3   udp    111  portmapper
>>>>>>   100000    2   udp    111  portmapper
>>>>>>   100024    1   udp  56082  status
>>>>>>   100024    1   tcp  41858  status
>>>>>>   100003    3   udp   2049  nfs
>>>>>>   100003    3   tcp   2049  nfs
>>>>>>   100003    4   udp   2049  nfs
>>>>>>   100003    4   tcp   2049  nfs
>>>>>>   100005    1   udp  45611  mountd
>>>>>>   100005    1   tcp  55915  mountd
>>>>>>   100005    3   udp  45611  mountd
>>>>>>   100005    3   tcp  55915  mountd
>>>>>>   100021    4   udp  48775  nlockmgr
>>>>>>   100021    4   tcp  51621  nlockmgr
>>>>>>   100011    1   udp   4501  rquotad
>>>>>>   100011    1   tcp   4501  rquotad
>>>>>>   100011    2   udp   4501  rquotad
>>>>>>   100011    2   tcp   4501  rquotad
>>>>>> 
>>>>>>> 
>>>>>>> What does "rpcinfo -p <server-ip>" show?
>>>>>>> 
>>>>>>> Do you have selinux enabled? I am not sure if that is playing any role
>>>>>>> here...
>>>>>> 
>>>>>> Nope, it's disabled:
>>>>>> 
>>>>>> # uname -a
>>>>>> Linux node2 3.10.0-229.4.2.el7.x86_64 #1 SMP Wed May 13 10:06:09 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
>>>>>> 
>>>>>> 
>>>>>> Thanks for the help,
>>>>>> 
>>>>>>   Alessandro
>>>>>> 
>>>>>>> 
>>>>>>> Regards, Malahal.
>>>>>>> 
>>>>>>> Alessandro De Salvo [Alessandro.DeSalvo@xxxxxxxxxxxxx] wrote:
>>>>>>>> Hi,
>>>>>>>> this was an extract from the old logs, before Soumya's suggestion of
>>>>>>>> changing the rquota port in the conf file. The new logs are attached
>>>>>>>> (ganesha-20150611.log.gz) as well as the gstack of the ganesha process
>>>>>>>> while I was executing the hanging showmount
>>>>>>>> (ganesha-20150611.gstack.gz).
>>>>>>>> Thanks,
>>>>>>>> 
>>>>>>>>  Alessandro
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Thu, 2015-06-11 at 11:37 -0500, Malahal Naineni wrote:
>>>>>>>>> Soumya Koduri [skoduri@xxxxxxxxxx] wrote:
>>>>>>>>>> CCin ganesha-devel to get more inputs.
>>>>>>>>>> 
>>>>>>>>>> In case of ipv6 enabled, only v6 interfaces are used by NFS-Ganesha.
>>>>>>>>> 
>>>>>>>>> I am not a network expert but I have seen IPv4 traffic over IPv6
>>>>>>>>> interface while fixing few things before. This may be normal.
>>>>>>>>> 
>>>>>>>>>> commit - git show 'd7e8f255' , which got added in v2.2 has more details.
>>>>>>>>>> 
>>>>>>>>>>> # netstat -ltaupn | grep 2049
>>>>>>>>>>> tcp6       4      0 :::2049                 :::*
>>>>>>>>>>> LISTEN      32080/ganesha.nfsd
>>>>>>>>>>> tcp6       1      0 x.x.x.2:2049      x.x.x.2:33285     CLOSE_WAIT
>>>>>>>>>>> -
>>>>>>>>>>> tcp6       1      0 127.0.0.1:2049          127.0.0.1:39555
>>>>>>>>>>> CLOSE_WAIT  -
>>>>>>>>>>> udp6       0      0 :::2049                 :::*
>>>>>>>>>>> 32080/ganesha.nfsd
>>>>>>>>>> 
>>>>>>>>>>>>> I have enabled the full debug already, but I see nothing special. Before exporting any volume the log shows no error, even when I do a showmount (the log is attached, ganesha.log.gz). If I do the same after exporting a volume nfs-ganesha does not even start, complaining for not being able to bind the IPv6 ruota socket, but in fact there is nothing listening on IPv6, so it should not happen:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> tcp6       0      0 :::111                  :::*                    LISTEN      7433/rpcbind
>>>>>>>>>>>>> tcp6       0      0 :::2224                 :::*                    LISTEN      9054/ruby
>>>>>>>>>>>>> tcp6       0      0 :::22                   :::*                    LISTEN      1248/sshd
>>>>>>>>>>>>> udp6       0      0 :::111                  :::*                                7433/rpcbind
>>>>>>>>>>>>> udp6       0      0 fe80::8c2:27ff:fef2:123 :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 fe80::230:48ff:fed2:123 :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 ::1:123                 :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 fe80::5484:7aff:fef:123 :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 :::123                  :::*                                31238/ntpd
>>>>>>>>>>>>> udp6       0      0 :::824                  :::*                                7433/rpcbind
>>>>>>>>>>>>> 
>>>>>>>>>>>>> The error, as shown in the attached ganesha-after-export.log.gz logfile, is the following:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets_V6 :DISP :WARN :Cannot bind RQUOTA tcp6 socket, error 98 (Address already in use)
>>>>>>>>>>>>> 10/06/2015 02:07:47 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] Bind_sockets :DISP :FATAL :Error binding to V6 interface. Cannot continue.
>>>>>>>>>>>>> 10/06/2015 02:07:48 : epoch 55777fb5 : node2 : ganesha.nfsd-26195[main] glusterfs_unload :FSAL :DEBUG :FSAL Gluster unloaded
>>>>>>>>> 
>>>>>>>>> The above messages indicate that someone tried to restart ganesha. But
>>>>>>>>> ganesha failed to come up because RQUOTA port (default is 875) is
>>>>>>>>> already in use by an old ganesha instance or some other program holding
>>>>>>>>> it. The new instance of ganesha will die, but if you are using systemd,
>>>>>>>>> it will try to restart automatically. We have disabled systemd auto
>>>>>>>>> restart in our environment as it was causing issues for debugging.
>>>>>>>>> 
>>>>>>>>> What version of ganesha is this?
>>>>>>>>> 
>>>>>>>>> Regards, Malahal.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Gluster-users mailing list
>>>>>> Gluster-users@xxxxxxxxxxx
>>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Gluster-users mailing list
>>>>> Gluster-users@xxxxxxxxxxx
>>>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>> 
>>>> 
>>> 
>> 
> 
> 

Attachment:
smime.p7s

Description: S/MIME cryptographic signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users