Re: RE : Frequent connect and disconnect messages flooded in logs

Mohammed Rafi K C <rkavunga@xxxxxxxxxx> · Mon, 19 Dec 2016 20:39:40 +0530

    Hi Micha,
    Can you please also see if there is any error messages in dmesg ?
      Basically I'm trying to see whether your hitting issues described
      in https://bugzilla.kernel.org/show_bug.cgi?id=73831 .

    Regards
    Rafi KC

    On 12/19/2016 11:58 AM, Mohammed Rafi K
      C wrote:

      Hi Micha,
      Sorry for the late reply. I was busy with some other things.
      If you have still the setup available Can you enable TRACE log
        level [1],[2] and see if you could find any log entries when the
        network start disconnecting. Basically I'm trying to find out
        any disconnection had occurred other than ping timer expire
        issue.

      [1] : gluster volume <volname>
        diagnostics.brick-log-level TRACE
      [2] : gluster volume <volname>
        diagnostics.client-log-level TRACE

      Regards
      Rafi KC

      On 12/08/2016 07:59 PM, Atin
        Mukherjee wrote:

            On Thu, Dec 8, 2016 at 4:37 PM,
              Micha Ober <micha2k@xxxxxxxxx>
              wrote:

                  Hi
                    Rafi,

                    thank you for your support. It is greatly
                    appreciated.

                    Just some more thoughts from my side:

                    There have been no reports from other  users in
                    *this* thread until now, but I have found at least
                    one user with a very simiar problem in an older
                    thread:

                    https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html

                    He is also reporting disconnects  with no apparent
                    reasons, althogh his setup is a bit more
                    complicated, also involving a firewall. In our
                    setup, all servers/clients are connected via 1 GbE
                    with no firewall or anything that might
                    block/throttle traffic. Also, we are using exactly
                    the same software versions on all nodes.

                    I can also find some reports in the bugtracker when
                    searching for "rpc_client_ping_timer_expired"
                    and "rpc_clnt_ping_timer_expired" (looks like
                    spelling changed during versions).

                    https://bugzilla.redhat.com/show_bug.cgi?id=1096729

              Just FYI, this is a different issue, here GlusterD
                fails to handle the volume of incoming requests on time
                since MT-epoll is not enabled here.

                    https://bugzilla.redhat.com/show_bug.cgi?id=1370683

                    But both reports involve large traffic/load on the
                    bricks/disks, which is not the case for out setup.

                    To give a ballpark figure: Over three days, 30 GiB
                    were written. And the data was not written at once,
                    but continuously over the whole time.

                    Just to be sure, I have checked the logfiles of one
                    of the other clusters right now, which are sitting
                    in the same building, in the same rack, even on the
                    same switch, running the same jobs, but with
                    glusterfs 3.4.2 and I can see no disconnects in the
                    logfiles. So I can definitely rule out our
                    infrastructure as problem.

                    Regards,

                    Micha

                        Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K
                        C:

                        Hi Micha,
                        This is great. I will provide you one debug
                          build which has two fixes which I possible
                          suspect for a frequent disconnect issue,
                          though I don't have much data to validate my
                          theory. So I will take one more day to dig in
                          to that.
                        Thanks for your support, and opensource++  
                        Regards
                        Rafi KC

                        On
                          12/07/2016 05:02 AM, Micha Ober wrote:

                          Hi,

                            thank you for your answer and even more for
                            the question!

                            Until now, I was using FUSE. Today I changed
                            all mounts to NFS using the same 3.7.17
                            version.

                            But: The problem is still the same. Now, the
                            NFS logfile contains lines like these:

                            [2016-12-06 15:12:29.006325] C
                            [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
                            0-gv0-client-7: server X.X.18.62:49153 has
                            not responded in the last 42 seconds,
                            disconnecting.

                            Interestingly enough,  the IP address
                            X.X.18.62 is the same machine! As I wrote
                            earlier, each node serves both as a server
                            and a client, as each node contributes
                            bricks to the volume. Every server is
                            connecting to itself via its hostname. For
                            example, the fstab on the node "giant2"
                            looks like:

                            #giant2:/gv0    /shared_data   
                            glusterfs       defaults,noauto 0       0

                            #giant2:/gv2    /shared_slurm  
                            glusterfs       defaults,noauto 0       0

                            giant2:/gv0     /shared_data   
                            nfs             defaults,_netdev,vers=3
                            0       0

                            giant2:/gv2     /shared_slurm  
                            nfs             defaults,_netdev,vers=3
                            0       0

                            So I understand the disconnects even less. 

                            I don't know if it's possible to create a
                            dummy cluster which exposes the same
                            behaviour, because the disconnects only
                            happen when there are compute jobs running
                            on those nodes - and they are GPU compute
                            jobs, so that's something which cannot be
                            easily emulated in a VM.

                            As we have more clusters (which are running
                            fine with an ancient 3.4 version :-)) and we
                            are currently not dependent on this
                            particular cluster (which may stay like this
                            for this month, I think) I should be able to
                            deploy the debug build on the "real"
                            cluster, if you can provide a debug build.

                            Regards and thanks,

                            Micha

                            Am 06.12.2016 um 08:15 schrieb Mohammed Rafi
                            K C:

                            On
                              12/03/2016 12:56 AM, Micha Ober wrote:

                              **
                                  Update: ** I have downgraded from
                                  3.8.6 to 3.7.17 now, but the problem
                                  still exists.

                                Client log: http://paste.ubuntu.com/23569065/

                                Brick log: http://paste.ubuntu.com/23569067/

                                Please note that each server
                                  has two bricks.

                                Whereas, according to the logs,
                                  one brick loses the connection to all
                                  other hosts:

                                [2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)

The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.

It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.

                            Hi Micha,

                            Thanks for the update and sorry for what
                            happened with gluster higher versions. I can
                            understand the need for downgrade as it is a
                            production setup.

                            Can you tell me the clients used here ?
                            whether it is a fuse,nfs,nfs-ganesha, smb or
                            libgfapi ?

                            Since I'm not able to reproduce the issue (I
                            have been trying from last 3days) and the
                            logs are not much helpful here (we don't
                            have much logs in socket layer), Could you
                            please create a dummy cluster and try to
                            reproduce the issue? If then we can play
                            with that volume and I could provide some
                            debug build which we can use for further
                            debugging?

                            If you don't have bandwidth for this, please
                            leave it ;).

                            Regards

                            Rafi KC

                                - Micha

                                Am 30.11.2016 um 06:57 schrieb Mohammed
                                Rafi K C:

                                Hi Micha,
                                I have changed the thread and subject
                                  so that your original thread remain
                                  same for your query. Let's try to fix
                                  the problem what you observed with
                                  3.8.4, So I have started a new thread
                                  to discuss the frequent disconnect
                                  problem.
                                If any one else has experienced
                                    the same problem, please respond to
                                    the mail.

                                It would be very helpful if you could
                                  give us some more logs from clients
                                  and bricks.  Also any reproducible
                                  steps will surely help to chase the
                                  problem further.
                                Regards
                                Rafi KC

                                On
                                  11/30/2016 04:44 AM, Micha Ober wrote:

                                      I had opened
                                          another thread on this mailing
                                          list (Subject: "After upgrade
                                          from 3.4.2 to 3.8.5 - High CPU
                                          usage resulting in disconnects
                                          and split-brain").

                                      The title may be a
                                          bit misleading now, as I am no
                                          longer observing high CPU
                                          usage after upgrading to
                                          3.8.6, but the disconnects are
                                          still happening and the number
                                          of files in split-brain is
                                          growing.

                                      Setup: 6 compute
                                          nodes, each serving as a
                                          glusterfs server and client,
                                          Ubuntu 14.04, two bricks per
                                          node, distribute-replicate

                                      I have two gluster
                                          volumes set up (one for
                                          scratch data, one for the
                                          slurm scheduler). Only the
                                          scratch data volume shows
                                          critical errors "[...] has not
                                          responded in the last 42
                                          seconds, disconnecting.". So I
                                          can rule out network problems,
                                          the gigabit link between the
                                          nodes is not saturated at all.
                                          The disks are almost idle
                                          (<10%).

                                      I have glusterfs
                                          3.4.2 on Ubuntu 12.04 on a
                                          another compute cluster,
                                          running fine since it was
                                          deployed.
                                      I had glusterfs
                                          3.4.2 on Ubuntu 14.04 on this
                                          cluster, running fine for
                                          almost a year.

                                      After upgrading to
                                          3.8.5, the problems (as
                                          described) started. I would
                                          like to use some of the new
                                          features of the newer versions
                                          (like bitrot), but the users
                                          can't run their compute jobs
                                          right now because the result
                                          files are garbled.

                                      There also seems to
                                          be a bug report with a smiliar
                                          problem: (but no progress)
                                      https://bugzilla.redhat.com/show_bug.cgi?id=1370683

                                      For me, ALL servers
                                          are affected (not isolated to
                                          one or two servers)

                                      I also see messages
                                          like "INFO:
                                            task gpu_graphene_bv:4476
                                            blocked for more than 120
                                            seconds." in the syslog.

                                      For completeness
                                          (gv0 is the scratch volume,
                                          gv2 the slurm volume):

                                      [root@giant2: ~]#
                                          gluster v info

                                      Volume Name: gv0
                                      Type:
                                          Distributed-Replicate
                                      Volume ID:
                                          993ec7c9-e4bc-44d0-b7c4-2d977e622e86
                                      Status: Started
                                      Snapshot Count: 0
                                      Number of Bricks: 6
                                          x 2 = 12
                                      Transport-type: tcp
                                      Bricks:
                                      Brick1:
                                          giant1:/gluster/sdc/gv0
                                      Brick2:
                                          giant2:/gluster/sdc/gv0
                                      Brick3:
                                          giant3:/gluster/sdc/gv0
                                      Brick4:
                                          giant4:/gluster/sdc/gv0
                                      Brick5:
                                          giant5:/gluster/sdc/gv0
                                      Brick6:
                                          giant6:/gluster/sdc/gv0
                                      Brick7:
                                          giant1:/gluster/sdd/gv0
                                      Brick8:
                                          giant2:/gluster/sdd/gv0
                                      Brick9:
                                          giant3:/gluster/sdd/gv0
                                      Brick10:
                                          giant4:/gluster/sdd/gv0
                                      Brick11:
                                          giant5:/gluster/sdd/gv0
                                      Brick12:
                                          giant6:/gluster/sdd/gv0
                                      Options
                                          Reconfigured:
                                      auth.allow:
                                          X.X.X.*,127.0.0.1
                                      nfs.disable: on

                                      Volume Name: gv2
                                      Type: Replicate
                                      Volume ID:
                                          30c78928-5f2c-4671-becc-8deaee1a7a8d
                                      Status: Started
                                      Snapshot Count: 0
                                      Number of Bricks: 1
                                          x 2 = 2
                                      Transport-type: tcp
                                      Bricks:
                                      Brick1:
                                          giant1:/gluster/sdd/gv2
                                      Brick2:
                                          giant2:/gluster/sdd/gv2
                                      Options
                                          Reconfigured:
                                      auth.allow:
                                          X.X.X.*,127.0.0.1
                                      cluster.granular-entry-heal:
                                          on
                                      cluster.locking-scheme:
                                          granular
                                      nfs.disable: on

                                    2016-11-30
                                      0:10 GMT+01:00 Micha Ober <micha2k@xxxxxxxxx>:

                                          There
                                            also seems to be a bug
                                            report with a smiliar
                                            problem: (but no progress)
                                          https://bugzilla.redhat.com/show_bug.cgi?id=1370683

                                          For me, ALL
                                              servers are affected (not
                                              isolated to one or two
                                              servers)

                                          I also see
                                              messages like "INFO:
                                                task
                                                gpu_graphene_bv:4476
                                                blocked for more than
                                                120 seconds." in the
                                              syslog.

                                          For
                                              completeness (gv0 is the
                                              scratch volume, gv2 the
                                              slurm volume):

                                              [root@giant2: ~]#
                                                gluster v info

                                              Volume Name: gv0
                                              Type:
                                                Distributed-Replicate
                                              Volume ID:
                                                993ec7c9-e4bc-44d0-b7c4-2d977e622e86
                                              Status: Started
                                              Snapshot Count: 0
                                              Number of Bricks: 6 x
                                                2 = 12
                                              Transport-type: tcp
                                              Bricks:
                                              Brick1:
                                                giant1:/gluster/sdc/gv0
                                              Brick2:
                                                giant2:/gluster/sdc/gv0
                                              Brick3:
                                                giant3:/gluster/sdc/gv0
                                              Brick4:
                                                giant4:/gluster/sdc/gv0
                                              Brick5:
                                                giant5:/gluster/sdc/gv0
                                              Brick6:
                                                giant6:/gluster/sdc/gv0
                                              Brick7:
                                                giant1:/gluster/sdd/gv0
                                              Brick8:
                                                giant2:/gluster/sdd/gv0
                                              Brick9:
                                                giant3:/gluster/sdd/gv0
                                              Brick10:
                                                giant4:/gluster/sdd/gv0
                                              Brick11:
                                                giant5:/gluster/sdd/gv0
                                              Brick12:
                                                giant6:/gluster/sdd/gv0
                                              Options Reconfigured:
                                              auth.allow:
                                                X.X.X.*,127.0.0.1
                                              nfs.disable: on

                                              Volume Name: gv2
                                              Type: Replicate
                                              Volume ID:
                                                30c78928-5f2c-4671-becc-8deaee1a7a8d
                                              Status: Started
                                              Snapshot Count: 0
                                              Number of Bricks: 1 x
                                                2 = 2
                                              Transport-type: tcp
                                              Bricks:
                                              Brick1:
                                                giant1:/gluster/sdd/gv2
                                              Brick2:
                                                giant2:/gluster/sdd/gv2
                                              Options Reconfigured:
                                              auth.allow:
                                                X.X.X.*,127.0.0.1
                                              cluster.granular-entry-heal:
                                                on
                                              cluster.locking-scheme:
                                                granular
                                              nfs.disable: on

                                              2016-11-29
                                                19:21 GMT+01:00 Micha
                                                Ober <micha2k@xxxxxxxxx>:

                                                    I
                                                      had opened another
                                                      thread on this
                                                      mailing list
                                                      (Subject: "After
                                                      upgrade from 3.4.2
                                                      to 3.8.5 - High
                                                      CPU usage
                                                      resulting in
                                                      disconnects and
                                                      split-brain").

                                                    The
                                                      title may be a bit
                                                      misleading now, as
                                                      I am no longer
                                                      observing high CPU
                                                      usage after
                                                      upgrading to
                                                      3.8.6, but the
                                                      disconnects are
                                                      still happening
                                                      and the number of
                                                      files in
                                                      split-brain is
                                                      growing.

                                                    Setup:
                                                      6 compute nodes,
                                                      each serving as a
                                                      glusterfs server
                                                      and client, Ubuntu
                                                      14.04, two bricks
                                                      per node,
                                                      distribute-replicate

                                                    I
                                                      have two gluster
                                                      volumes set up
                                                      (one for scratch
                                                      data, one for the
                                                      slurm scheduler).
                                                      Only the scratch
                                                      data volume shows
                                                      critical errors
                                                      "[...] has not
                                                      responded in the
                                                      last 42 seconds,
                                                      disconnecting.".
                                                      So I can rule out
                                                      network problems,
                                                      the gigabit link
                                                      between the nodes
                                                      is not saturated
                                                      at all. The disks
                                                      are almost idle
                                                      (<10%).

                                                    I
                                                      have glusterfs
                                                      3.4.2 on Ubuntu
                                                      12.04 on a another
                                                      compute cluster,
                                                      running fine since
                                                      it was deployed.
                                                    I
                                                      had glusterfs
                                                      3.4.2 on Ubuntu
                                                      14.04 on this
                                                      cluster, running
                                                      fine for almost a
                                                      year.

                                                    After
                                                      upgrading to
                                                      3.8.5, the
                                                      problems (as
                                                      described)
                                                      started. I would
                                                      like to use some
                                                      of the new
                                                      features of the
                                                      newer versions
                                                      (like bitrot), but
                                                      the users can't
                                                      run their compute
                                                      jobs right now
                                                      because the result
                                                      files are garbled.

                                                        2016-11-29
                                                          18:53
                                                          GMT+01:00 Atin
                                                          Mukherjee <amukherj@xxxxxxxxxx>:

                                                          Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn't look a feasible option.

                                                          On
                                                          Tue, 29 Nov
                                                          2016 at 17:01,
                                                          Micha Ober
                                                          <micha2k@xxxxxxxxx>
                                                          wrote:

                                                          Hi,

                                                          I was using gluster 3.4 and
                                                          upgraded to
                                                          3.8, but that
                                                          version showed
                                                          to be unusable
                                                          for me. I now
                                                          need to
                                                          downgrade.

                                                          I'm running Ubuntu 14.04. As
                                                          upgrades of
                                                          the op version
are irreversible, I guess I have to delete all gluster volumes and
                                                          re-create them
                                                          with the
                                                          downgraded
                                                          version. 

                                                          0. Backup data
                                                          1. Unmount all gluster volumes
                                                          2. apt-get purge
                                                          glusterfs-server
glusterfs-client
                                                          3. Remove PPA for 3.8
                                                          4. Add PPA for older version
                                                          5. apt-get install
                                                          glusterfs-server
glusterfs-client
                                                          6. Create volumes

                                                          Is "purge" enough to delete all
                                                          configuration
                                                          files of the
                                                          currently
                                                          installed
                                                          version or do
                                                          I need to
                                                           manually
                                                          clear some
                                                          residues
                                                          before
                                                          installing an
                                                          older version?

                                                          Thanks.

                                                          _______________________________________________

                                                          Gluster-users
                                                          mailing list

                                                          Gluster-users@xxxxxxxxxxx

                                                          http://www.gluster.org/mailman/listinfo/gluster-users

                                                          --

                                                          -
                                                          Atin (atinm)

                                  _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

-- 

~ Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users