Re: Transport endpoint is not connected : issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Not sure if this is a good depiction of this issue, as after the shutdown of all the hosts (all three (2)data (1) arbiter) we were able to get the double processes per volume to stop.

But anyways here is the output of ps ,  Thanks again. :

 

 

 

 

 

 

ps aux |grep gluster

root       3412  0.0  0.3 3870120 205064 ?      Ssl  Aug30   5:47 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

root       5521  1.9  0.0 3169256 63580 ?       Ssl  Aug30 136:01 /usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id ovirt_engine.fs1-tier3.rrc.local.bricks-brick0-ovirt_engine -p /var/run/gluster/vols/ovirt_engine/fs1-tier3.rrc.local-bricks-brick0-ovirt_engine.pid -S /var/run/gluster/51a5a80d87661c2c4f9479e59a19b7cc.socket --brick-name /bricks/brick0/ovirt_engine -l /var/log/glusterfs/bricks/bricks-brick0-ovirt_engine.log --xlator-option *-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49152 --xlator-option ovirt_engine-server.listen-port=49152

root       5528  0.1  0.0 2182576 46092 ?       Ssl  Aug30   8:02 /usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id ovirt_export.fs1-tier3.rrc.local.bricks-brick1-ovirt_export -p /var/run/gluster/vols/ovirt_export/fs1-tier3.rrc.local-bricks-brick1-ovirt_export.pid -S /var/run/gluster/ea5558bf22be5fae3d6168a3d07415ba.socket --brick-name /bricks/brick1/ovirt_export -l /var/log/glusterfs/bricks/bricks-brick1-ovirt_export.log --xlator-option *-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49153 --xlator-option ovirt_export-server.listen-port=49153

root       5538  0.1  0.0 2314168 50512 ?       Ssl  Aug30   8:04 /usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id ovirt_isos.fs1-tier3.rrc.local.bricks-brick1-ovirt_isos -p /var/run/gluster/vols/ovirt_isos/fs1-tier3.rrc.local-bricks-brick1-ovirt_isos.pid -S /var/run/gluster/25acf05d530c8e041298c362b1589a51.socket --brick-name /bricks/brick1/ovirt_isos -l /var/log/glusterfs/bricks/bricks-brick1-ovirt_isos.log --xlator-option *-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49154 --xlator-option ovirt_isos-server.listen-port=49154

root       5549  0.0  0.0 1895584 47136 ?       Ssl  Aug30   1:20 /usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id ovirt_mmpf_samba.fs1-tier3.rrc.local.bricks-brick2-ovirt_mmpf_samba -p /var/run/gluster/vols/ovirt_mmpf_samba/fs1-tier3.rrc.local-bricks-brick2-ovirt_mmpf_samba.pid -S /var/run/gluster/a65c4e775a4fb7bbaccb8807de3e1413.socket --brick-name /bricks/brick2/ovirt_mmpf_samba -l /var/log/glusterfs/bricks/bricks-brick2-ovirt_mmpf_samba.log --xlator-option *-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49155 --xlator-option ovirt_mmpf_samba-server.listen-port=49155

root       5559 19.9  0.0 3169256 63020 ?       Ssl  Aug30 1375:48 /usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id ovirt_vms.fs1-tier3.rrc.local.bricks-brick1-ovirt_vms -p /var/run/gluster/vols/ovirt_vms/fs1-tier3.rrc.local-bricks-brick1-ovirt_vms.pid -S /var/run/gluster/8bd29ece67b8bb364fa9038d630c5a26.socket --brick-name /bricks/brick1/ovirt_vms -l /var/log/glusterfs/bricks/bricks-brick1-ovirt_vms.log --xlator-option *-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49156 --xlator-option ovirt_vms-server.listen-port=49156

root     190876 28.4  0.1 1454264 83836 ?       Ssl  07:59  17:44 /usr/sbin/glusterfsd -s fs1-tier3.rrc.local --volfile-id ccts_oracle.fs1-tier3.rrc.local.bricks-brick3-ccts_oracle -p /var/run/gluster/vols/ccts_oracle/fs1-tier3.rrc.local-bricks-brick3-ccts_oracle.pid -S /var/run/gluster/db718796520ecaf218d3889c9af2d3a5.socket --brick-name /bricks/brick3/ccts_oracle -l /var/log/glusterfs/bricks/bricks-brick3-ccts_oracle.log --xlator-option *-posix.glusterd-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f --brick-port 49157 --xlator-option ccts_oracle-server.listen-port=49157

root     190901  0.0  0.0 2318040 21636 ?       Ssl  07:59   0:01 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/1e9d2979118671294386bfac399847c5.socket --xlator-option *replicate*.node-uuid=ab34955c-a0ba-4f1e-8bac-a448f52e145f

root     195210  0.0  0.0 112708   964 pts/0    S+   09:01   0:00 grep --color=auto gluster

 

 

From: Karthik Subrahmanya <ksubrahm@xxxxxxxxxx>
Date: Monday, September 3, 2018 at 6:36 AM
To: "Johnson, Tim" <tjj@xxxxxxx>
Cc: Atin Mukherjee <amukherj@xxxxxxxxxx>, Ravishankar N <ravishankar@xxxxxxxxxx>, gluster-users <gluster-users@xxxxxxxxxxx>, "Chlipala, George Edward" <gchlip2@xxxxxxx>
Subject: Re: [Gluster-users] Transport endpoint is not connected : issue

 

 

On Mon, Sep 3, 2018 at 11:17 AM Karthik Subrahmanya <ksubrahm@xxxxxxxxxx> wrote:

Hey,

 

We need some more information to debug this.

I think you missed to send the output of 'gluster volume info <volname>'.

Can you also provide the bricks, shd and glfsheal logs as well?

In the setup how many peers are present? You also mentioned that "one of the file servers have two processes for each of the volumes instead of one per volume", which process are you talking about here?

Also provide the "ps aux | grep gluster" output.

 

Regards,

Karthik

 

On Sat, Sep 1, 2018 at 12:10 AM Johnson, Tim <tjj@xxxxxxx> wrote:

Thanks for the reply.

 

   I have attached the gluster.log file from the host that it is happening to at this time.

It does change which host it does this on.

 

Thanks.

 

From: Atin Mukherjee <amukherj@xxxxxxxxxx>
Date: Friday, August 31, 2018 at 1:03 PM
To: "Johnson, Tim" <tjj@xxxxxxx>
Cc: Karthik Subrahmanya <ksubrahm@xxxxxxxxxx>, Ravishankar N <ravishankar@xxxxxxxxxx>, "gluster-users@xxxxxxxxxxx" <gluster-users@xxxxxxxxxxx>
Subject: Re: [Gluster-users] Transport endpoint is not connected : issue

 

Can you please pass all the gluster log files from the server where the transport end point not connected error is reported? As restarting glusterd didn’t solve this issue, I believe this isn’t a stale port problem but something else. Also please provide the output of ‘gluster v info <volname>’

 

(@cc Ravi, Karthik)

 

On Fri, 31 Aug 2018 at 23:24, Johnson, Tim <tjj@xxxxxxx> wrote:

Hello all,

 

      We have a gluster replicate (with arbiter)  volumes that we are getting “Transport endpoint is not connected” with on a rotating basis  from each of the two file servers, and a third host that has the arbiter bricks on.

This is happening when trying to run a heal on all the volumes on the gluster hosts   When I get the status of all the volumes all looks good.

       This behavior seems to be a forshadowing of the gluster volumes becoming unresponsive to our vm cluster.  As well as one of the file servers have two processes for each of the volumes instead of one per volume. Eventually the affected file server

will drop off the listed peers. Restarting glusterd/glusterfsd on the affected file server does not take care of the issue, we have to bring down both file

Servers due to the volumes not being seen by the vm cluster after the errors start occurring. I had seen that there were bug reports about the “Transport endpoint is not connected” on earlier versions of Gluster however had thought that

It had been addressed.  

     Dmesg did have some entries for “a possible syn flood on port *” which we changed the  sysctl to “net.ipv4.tcp_max_syn_backlog = 2048” which seemed to help the syn flood messages but not the underlying volume issues.

    I have put the versions of all the Gluster packages installed below as well as the   “Heal” and “Status” commands showing the volumes are

 

       This has just started happening but cannot definitively say if this started occurring after an update or not. 

      

 

Thanks for any assistance.

 

 

Running Heal  :

 

gluster volume heal ovirt_engine info

Brick ****1.rrc.local:/bricks/brick0/ovirt_engine

Status: Connected

Number of entries: 0

 

Brick ****3.rrc.local:/bricks/brick0/ovirt_engine

Status: Transport endpoint is not connected

Number of entries: -

 

Brick *****3.rrc.local:/bricks/arb-brick/ovirt_engine

Status: Transport endpoint is not connected

Number of entries: -

 

 

Running status :

 

gluster volume status ovirt_engine

Status of volume: ovirt_engine

Gluster process                             TCP Port  RDMA Port  Online  Pid

------------------------------------------------------------------------------

Brick*****.rrc.local:/bricks/brick0/ov

irt_engine                                  49152     0          Y       5521

Brick fs2-tier3.rrc.local:/bricks/brick0/ov

irt_engine                                  49152     0          Y       6245

Brick ****.rrc.local:/bricks/arb-b

rick/ovirt_engine                           49152     0          Y       3526

Self-heal Daemon on localhost               N/A       N/A        Y       5509

Self-heal Daemon on ***.rrc.local     N/A       N/A        Y       6218

Self-heal Daemon on ***.rrc.local       N/A       N/A        Y       3501

Self-heal Daemon on ****.rrc.local N/A       N/A        Y       3657

Self-heal Daemon on *****.rrc.local   N/A       N/A        Y       3753

Self-heal Daemon on ****.rrc.local N/A       N/A        Y       17284

 

Task Status of Volume ovirt_engine

------------------------------------------------------------------------------

There are no active volume tasks

 

 

 

 

/etc/glusterd.vol.   :

 

 

volume management

    type mgmt/glusterd

    option working-directory /var/lib/glusterd

    option transport-type socket,rdma

    option transport.socket.keepalive-time 10

    option transport.socket.keepalive-interval 2

    option transport.socket.read-fail-log off

    option ping-timeout 0

    option event-threads 1

    option rpc-auth-allow-insecure on

#   option transport.address-family inet6

#   option base-port 49152

end-volume

 

 

 

 

 

rpm -qa |grep gluster

glusterfs-3.12.13-1.el7.x86_64

glusterfs-gnfs-3.12.13-1.el7.x86_64

glusterfs-api-3.12.13-1.el7.x86_64

glusterfs-cli-3.12.13-1.el7.x86_64

glusterfs-client-xlators-3.12.13-1.el7.x86_64

glusterfs-fuse-3.12.13-1.el7.x86_64

centos-release-gluster312-1.0-2.el7.centos.noarch

glusterfs-rdma-3.12.13-1.el7.x86_64

glusterfs-libs-3.12.13-1.el7.x86_64

glusterfs-server-3.12.13-1.el7.x86_64

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

--

- Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux