Re: NFS-Ganesha lo traffic

Soumya Koduri <skoduri@xxxxxxxxxx> · Wed, 10 Aug 2016 11:05:50 +0530

On 08/09/2016 09:06 PM, Mahdi Adnan wrote:
Hi,
Thank you for your reply.

The traffic is related to GlusterFS;

18:31:20.419056 IP 192.168.208.134.49058 > 192.168.208.134.49153: Flags
[.], ack 3876, win 24576, options [nop,nop,TS val 247718812 ecr
247718772], length 0
18:31:20.419080 IP 192.168.208.134.49056 > 192.168.208.134.49154: Flags
[.], ack 11625, win 24576, options [nop,nop,TS val 247718812 ecr
247718772], length 0
18:31:20.419084 IP 192.168.208.134.49060 > 192.168.208.134.49152: Flags
[.], ack 9861, win 24576, options [nop,nop,TS val 247718812 ecr
247718772], length 0
18:31:20.419088 IP 192.168.208.134.49054 > 192.168.208.134.49155: Flags
[.], ack 4393, win 24568, options [nop,nop,TS val 247718812 ecr
247718772], length 0
18:31:20.420084 IP 192.168.208.134.49052 > 192.168.208.134.49156: Flags
[.], ack 5525, win 24576, options [nop,nop,TS val 247718813 ecr
247718773], length 0
18:31:20.420092 IP 192.168.208.134.49049 > 192.168.208.134.49158: Flags
[.], ack 6657, win 24576, options [nop,nop,TS val 247718813 ecr
247718773], length 0
18:31:20.421065 IP 192.168.208.134.49050 > 192.168.208.134.49157: Flags
[.], ack 4729, win 24570, options [nop,nop,TS val 247718814 ecr
247718774], length 0

Looks like that is the traffic coming to the bricks local to that node 
(>4915* ports are used by glusterfs brick processes). It could be from 
nfs-ganesha or any other glusterfs client processes (like self-heal 
daemon etc). Do you see this traffic even when there is no active I/O 
from the nfs-client? If so, it could be from the self-heal daemon then. 
Verify if there are any files/directories to be healed.

Screenshot from wireshark can be found in the attachments.
208.134 is the server IP address, and it's looks like it talking to
itself via the lo interface, im wondering if this is a normal behavior
or not.
yes. It is the expected behavior when there are clients actively 
accessing the volumes.

and regarding the Ganesha server logs, how can i debug it to find why
the servers not responding to the requests on time ?

I suggest again to take tcpdump. Sometimes nfs-ganesha server (glusterfs 
client) may have to communicate with all the bricks over the network 
(like LOOKUP) and that may result in delay if there are lots of bricks 
involved. Try capturing packets from the node where the nfs-ganesha 
server is running and examine the packets between any of the NFS-client 
request and its corresponding reply packet.

I usually use below cmd to capture the packets on all the interfaces -
#tcpdump -i any -s 0 -w /var/tmp/nfs.pcap tcp and not port 22

Thanks,
Soumya

--

Respectfully*
**Mahdi A. Mahdi*

Subject: Re:  NFS-Ganesha lo traffic
To: mahdi.adnan@xxxxxxxxxxx
From: skoduri@xxxxxxxxxx
CC: gluster-users@xxxxxxxxxxx; nfs-ganesha-devel@xxxxxxxxxxxxxxxxxxxxx
Date: Tue, 9 Aug 2016 18:02:01 +0530

On 08/09/2016 03:33 PM, Mahdi Adnan wrote:
> Hi,
>
> Im using NFS-Ganesha to access my volume, it's working fine for now but
> im seeing lots of traffic on the Loopback interface, in fact it's the
> same amount of traffic on the bonding interface, can anyone please
> explain to me why is this happening ?

Could you please capture packets on those interfaces using tcpdump and
examine the traffic?

> also, i got the following error in the ganesha log file;
>
> 09/08/2016 11:35:54 : epoch 57a5da0c : gfs04 :
> ganesha.nfsd-1646[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy. Not sending heartbeat
> 09/08/2016 11:46:04 : epoch 57a5da0c : gfs04 :
> ganesha.nfsd-1646[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy. Not sending heartbeat
> 09/08/2016 11:54:39 : epoch 57a5da0c : gfs04 :
> ganesha.nfsd-1646[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy. Not sending heartbeat
> 09/08/2016 12:06:04 : epoch 57a5da0c : gfs04 :
> ganesha.nfsd-1646[dbus_heartbeat] dbus_heartbeat_cb :DBUS :WARN :Health
> status is unhealthy. Not sending heartbeat
>
> is it something i should care about ?

Above warnings are thrown when the outstanding rpc request queue count
doesn't change within two heartbeats, in other words the server may be
taking a while to process the requests and responding slowly to its
clients.

Thanks,
Soumya

>
> My ganesha config is the following;
>
>
> EXPORT{
> Export_Id = 1 ;
> Path = "/vlm02";
>
> FSAL {
> name = GLUSTER;
> hostname = "gfs04";
> volume = "vlm02";
> }
>
> Access_type = RW;
> Disable_ACL = TRUE;
> Squash = No_root_squash;
> Protocols = "3" ;
> Transports = "TCP";
> }
>
>
> Im accessing it via a floating ip assigned by CTDB.
>
>
> Thank you.
> --
>
> Respectfully*
> **Mahdi A. Mahdi*
>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users@xxxxxxxxxxx
> http://www.gluster.org/mailman/listinfo/gluster-users
>
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users