Re: RE : Frequent connect and disconnect messages flooded in logs

Atin Mukherjee <amukherj@xxxxxxxxxx> · Thu, 8 Dec 2016 19:59:34 +0530

On Thu, Dec 8, 2016 at 4:37 PM, Micha Ober <micha2k@xxxxxxxxx> wrote:

    Hi Rafi,

      thank you for your support. It is greatly appreciated.

      Just some more thoughts from my side:

      There have been no reports from other  users in *this* thread
      until now, but I have found at least one user with a very simiar
      problem in an older thread:

https://www.gluster.org/pipermail/gluster-users/2014-November/019637.html

      He is also reporting disconnects  with no apparent reasons,
      althogh his setup is a bit more complicated, also involving a
      firewall. In our setup, all servers/clients are connected via 1
      GbE with no firewall or anything that might block/throttle
      traffic. Also, we are using exactly the same software versions on
      all nodes.

      I can also find some reports in the bugtracker when searching for
      "rpc_client_ping_timer_expired" and "rpc_clnt_ping_timer_expired"
      (looks like spelling changed during versions).

      https://bugzilla.redhat.com/show_bug.cgi?id=1096729

Just FYI, this is a different issue, here GlusterD fails to handle the volume of incoming requests on time since MT-epoll is not enabled here.

      https://bugzilla.redhat.com/show_bug.cgi?id=1370683

      But both reports involve large traffic/load on the bricks/disks,
      which is not the case for out setup.

      To give a ballpark figure: Over three days, 30 GiB were written.
      And the data was not written at once, but continuously over the
      whole time.

      Just to be sure, I have checked the logfiles of one of the other
      clusters right now, which are sitting in the same building, in the
      same rack, even on the same switch, running the same jobs, but
      with glusterfs 3.4.2 and I can see no disconnects in the logfiles.
      So I can definitely rule out our infrastructure as problem.

      Regards,

      Micha

      Am 07.12.2016 um 18:08 schrieb Mohammed Rafi K C:

      Hi Micha,
      This is great. I will provide you one debug build which has two
        fixes which I possible suspect for a frequent disconnect issue,
        though I don't have much data to validate my theory. So I will
        take one more day to dig in to that.
      Thanks for your support, and opensource++  
      Regards
      Rafi KC

      On 12/07/2016 05:02 AM, Micha Ober
        wrote:

        Hi,

          thank you for your answer and even more for the question!

          Until now, I was using FUSE. Today I changed all mounts to NFS
          using the same 3.7.17 version.

          But: The problem is still the same. Now, the NFS logfile
          contains lines like these:

          [2016-12-06 15:12:29.006325] C
          [rpc-clnt-ping.c:165:rpc_clnt_ping_timer_expired]
          0-gv0-client-7: server X.X.18.62:49153 has not responded in
          the last 42 seconds, disconnecting.

          Interestingly enough,  the IP address X.X.18.62 is the same
          machine! As I wrote earlier, each node serves both as a server
          and a client, as each node contributes bricks to the volume.
          Every server is connecting to itself via its hostname. For
          example, the fstab on the node "giant2" looks like:

          #giant2:/gv0    /shared_data    glusterfs      
          defaults,noauto 0       0

          #giant2:/gv2    /shared_slurm   glusterfs      
          defaults,noauto 0       0

          giant2:/gv0     /shared_data    nfs            
          defaults,_netdev,vers=3 0       0

          giant2:/gv2     /shared_slurm   nfs            
          defaults,_netdev,vers=3 0       0

          So I understand the disconnects even less. 

          I don't know if it's possible to create a dummy cluster which
          exposes the same behaviour, because the disconnects only
          happen when there are compute jobs running on those nodes -
          and they are GPU compute jobs, so that's something which
          cannot be easily emulated in a VM.

          As we have more clusters (which are running fine with an
          ancient 3.4 version :-)) and we are currently not dependent on
          this particular cluster (which may stay like this for this
          month, I think) I should be able to deploy the debug build on
          the "real" cluster, if you can provide a debug build.

          Regards and thanks,

          Micha

          Am 06.12.2016 um 08:15 schrieb Mohammed Rafi K C:

          On 12/03/2016 12:56 AM, Micha
            Ober wrote:

            ** Update: ** I have
                downgraded from 3.8.6 to 3.7.17 now, but the problem
                still exists.

              Client log: http://paste.ubuntu.com/23569065/

              Brick log: http://paste.ubuntu.com/23569067/

              Please note that each server has two bricks.

              Whereas, according to the logs, one brick loses
                the connection to all other hosts:

              [2016-12-02 18:38:53.703301] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.219:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703381] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.62:49118 failed (Broken pipe)
[2016-12-02 18:38:53.703380] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.107:49121 failed (Broken pipe)
[2016-12-02 18:38:53.703424] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.206:49120 failed (Broken pipe)
[2016-12-02 18:38:53.703359] W [socket.c:596:__socket_rwv] 0-tcp.gv0-server: writev on X.X.X.58:49121 failed (Broken pipe)

The SECOND brick on the SAME host is NOT affected, i.e. no disconnects!
As I said, the network connection is fine and the disks are idle.
The CPU always has 2 free cores.

It looks like I have to downgrade to 3.4 now in order for the disconnects to stop.

          Hi Micha,

          Thanks for the update and sorry for what happened with gluster
          higher versions. I can understand the need for downgrade as it
          is a production setup.

          Can you tell me the clients used here ? whether it is a
          fuse,nfs,nfs-ganesha, smb or libgfapi ?

          Since I'm not able to reproduce the issue (I have been trying
          from last 3days) and the logs are not much helpful here (we
          don't have much logs in socket layer), Could you please create
          a dummy cluster and try to reproduce the issue? If then we can
          play with that volume and I could provide some debug build
          which we can use for further debugging?

          If you don't have bandwidth for this, please leave it ;).

          Regards

          Rafi KC

              - Micha

              Am 30.11.2016 um 06:57 schrieb Mohammed Rafi K C:

              Hi Micha,
              I have changed the thread and subject so that your
                original thread remain same for your query. Let's try to
                fix the problem what you observed with 3.8.4, So I have
                started a new thread to discuss the frequent disconnect
                problem.
              If any one else has experienced the same problem,
                  please respond to the mail.

              It would be very helpful if you could give us some more
                logs from clients and bricks.  Also any reproducible
                steps will surely help to chase the problem further.
              Regards
              Rafi KC

              On 11/30/2016 04:44 AM, Micha
                Ober wrote:

                    I had opened another thread on this
                        mailing list (Subject: "After upgrade from 3.4.2
                        to 3.8.5 - High CPU usage resulting in
                        disconnects and split-brain").

                    The title may be a bit misleading
                        now, as I am no longer observing high CPU usage
                        after upgrading to 3.8.6, but the disconnects
                        are still happening and the number of files in
                        split-brain is growing.

                    Setup: 6 compute nodes, each serving
                        as a glusterfs server and client, Ubuntu 14.04,
                        two bricks per node, distribute-replicate

                    I have two gluster volumes set up
                        (one for scratch data, one for the slurm
                        scheduler). Only the scratch data volume shows
                        critical errors "[...] has not responded in the
                        last 42 seconds, disconnecting.". So I can rule
                        out network problems, the gigabit link between
                        the nodes is not saturated at all. The disks are
                        almost idle (<10%).

                    I have glusterfs 3.4.2 on Ubuntu
                        12.04 on a another compute cluster, running fine
                        since it was deployed.
                    I had glusterfs 3.4.2 on Ubuntu 14.04
                        on this cluster, running fine for almost a year.

                    After upgrading to 3.8.5, the
                        problems (as described) started. I would like to
                        use some of the new features of the newer
                        versions (like bitrot), but the users can't run
                        their compute jobs right now because the result
                        files are garbled.

                    There also seems to be a bug report
                        with a smiliar problem: (but no progress)
                    https://bugzilla.redhat.com/show_bug.cgi?id=1370683

                    For me, ALL servers are affected (not
                        isolated to one or two servers)

                    I also see messages like "INFO:
                          task gpu_graphene_bv:4476 blocked for more
                          than 120 seconds." in the syslog.

                    For completeness (gv0 is the scratch
                        volume, gv2 the slurm volume):

                    [root@giant2: ~]# gluster v info

                    Volume Name: gv0
                    Type: Distributed-Replicate
                    Volume ID:
                        993ec7c9-e4bc-44d0-b7c4-2d977e622e86
                    Status: Started
                    Snapshot Count: 0
                    Number of Bricks: 6 x 2 = 12
                    Transport-type: tcp
                    Bricks:
                    Brick1: giant1:/gluster/sdc/gv0
                    Brick2: giant2:/gluster/sdc/gv0
                    Brick3: giant3:/gluster/sdc/gv0
                    Brick4: giant4:/gluster/sdc/gv0
                    Brick5: giant5:/gluster/sdc/gv0
                    Brick6: giant6:/gluster/sdc/gv0
                    Brick7: giant1:/gluster/sdd/gv0
                    Brick8: giant2:/gluster/sdd/gv0
                    Brick9: giant3:/gluster/sdd/gv0
                    Brick10: giant4:/gluster/sdd/gv0
                    Brick11: giant5:/gluster/sdd/gv0
                    Brick12: giant6:/gluster/sdd/gv0
                    Options Reconfigured:
                    auth.allow: X.X.X.*,127.0.0.1
                    nfs.disable: on

                    Volume Name: gv2
                    Type: Replicate
                    Volume ID:
                        30c78928-5f2c-4671-becc-8deaee1a7a8d
                    Status: Started
                    Snapshot Count: 0
                    Number of Bricks: 1 x 2 = 2
                    Transport-type: tcp
                    Bricks:
                    Brick1: giant1:/gluster/sdd/gv2
                    Brick2: giant2:/gluster/sdd/gv2
                    Options Reconfigured:
                    auth.allow: X.X.X.*,127.0.0.1
                    cluster.granular-entry-heal: on
                    cluster.locking-scheme: granular
                    nfs.disable: on

                  2016-11-30 0:10 GMT+01:00
                    Micha Ober <micha2k@xxxxxxxxx>:

                        There
                          also seems to be a bug report with a smiliar
                          problem: (but no progress)
                        https://bugzilla.redhat.com/show_bug.cgi?id=1370683

                        For me, ALL
                            servers are affected (not isolated to one or
                            two servers)

                        I also see
                            messages like "INFO:
                              task gpu_graphene_bv:4476 blocked for more
                              than 120 seconds." in the syslog.

                        For completeness
                            (gv0 is the scratch volume, gv2 the slurm
                            volume):

                            [root@giant2: ~]#
                              gluster v info

                            Volume Name: gv0
                            Type:
                              Distributed-Replicate
                            Volume ID:
                              993ec7c9-e4bc-44d0-b7c4-2d977e622e86
                            Status: Started
                            Snapshot Count: 0
                            Number of Bricks:
                              6 x 2 = 12
                            Transport-type:
                              tcp
                            Bricks:
                            Brick1:
                              giant1:/gluster/sdc/gv0
                            Brick2:
                              giant2:/gluster/sdc/gv0
                            Brick3:
                              giant3:/gluster/sdc/gv0
                            Brick4:
                              giant4:/gluster/sdc/gv0
                            Brick5:
                              giant5:/gluster/sdc/gv0
                            Brick6:
                              giant6:/gluster/sdc/gv0
                            Brick7:
                              giant1:/gluster/sdd/gv0
                            Brick8:
                              giant2:/gluster/sdd/gv0
                            Brick9:
                              giant3:/gluster/sdd/gv0
                            Brick10:
                              giant4:/gluster/sdd/gv0
                            Brick11:
                              giant5:/gluster/sdd/gv0
                            Brick12:
                              giant6:/gluster/sdd/gv0
                            Options
                              Reconfigured:
                            auth.allow:
                              X.X.X.*,127.0.0.1
                            nfs.disable: on

                            Volume Name: gv2
                            Type: Replicate
                            Volume ID:
                              30c78928-5f2c-4671-becc-8deaee1a7a8d
                            Status: Started
                            Snapshot Count: 0
                            Number of Bricks:
                              1 x 2 = 2
                            Transport-type:
                              tcp
                            Bricks:
                            Brick1:
                              giant1:/gluster/sdd/gv2
                            Brick2:
                              giant2:/gluster/sdd/gv2
                            Options
                              Reconfigured:
                            auth.allow:
                              X.X.X.*,127.0.0.1
                            cluster.granular-entry-heal:
                              on
                            cluster.locking-scheme:
                              granular
                            nfs.disable: on

                            2016-11-29 19:21
                              GMT+01:00 Micha Ober <micha2k@xxxxxxxxx>:

                                  I
                                    had opened another thread on this
                                    mailing list (Subject: "After
                                    upgrade from 3.4.2 to 3.8.5 - High
                                    CPU usage resulting in disconnects
                                    and split-brain").

                                  The
                                    title may be a bit misleading now,
                                    as I am no longer observing high CPU
                                    usage after upgrading to 3.8.6, but
                                    the disconnects are still happening
                                    and the number of files in
                                    split-brain is growing.

                                  Setup:
                                    6 compute nodes, each serving as a
                                    glusterfs server and client, Ubuntu
                                    14.04, two bricks per node,
                                    distribute-replicate

                                  I
                                    have two gluster volumes set up (one
                                    for scratch data, one for the slurm
                                    scheduler). Only the scratch data
                                    volume shows critical errors "[...]
                                    has not responded in the last 42
                                    seconds, disconnecting.". So I can
                                    rule out network problems, the
                                    gigabit link between the nodes is
                                    not saturated at all. The disks are
                                    almost idle (<10%).

                                  I
                                    have glusterfs 3.4.2 on Ubuntu 12.04
                                    on a another compute cluster,
                                    running fine since it was deployed.
                                  I
                                    had glusterfs 3.4.2 on Ubuntu 14.04
                                    on this cluster, running fine for
                                    almost a year.

                                  After
                                    upgrading to 3.8.5, the problems (as
                                    described) started. I would like to
                                    use some of the new features of the
                                    newer versions (like bitrot), but
                                    the users can't run their compute
                                    jobs right now because the result
                                    files are garbled.

                                      2016-11-29
                                        18:53 GMT+01:00 Atin Mukherjee <amukherj@xxxxxxxxxx>:

                                          Would you be able to share what is not working for you in 3.8.x (mention the exact version). 3.4 is quite old and falling back to an unsupported version doesn't look a feasible option.

                                                On Tue,
                                                  29 Nov 2016 at 17:01,
                                                  Micha Ober <micha2k@xxxxxxxxx>
                                                  wrote:

                                                    Hi,

                                                    I was using gluster 3.4 and
                                                      upgraded to 3.8,
                                                      but that version
                                                      showed to be
                                                      unusable for me. I
                                                      now need to
                                                      downgrade.

                                                    I'm running Ubuntu 14.04. As
                                                      upgrades of the op
                                                      version
                                                      are irreversible,
                                                      I guess I have to
                                                      delete all gluster
                                                      volumes and
                                                      re-create them
                                                      with the
                                                      downgraded
                                                      version. 

                                                    0. Backup data
                                                    1. Unmount all gluster volumes
                                                    2. apt-get purge
                                                      glusterfs-server
                                                      glusterfs-client
                                                    3. Remove PPA for 3.8
                                                    4. Add PPA for older version
                                                    5. apt-get install
                                                      glusterfs-server
                                                      glusterfs-client
                                                    6. Create volumes

                                                    Is "purge" enough to delete all
                                                      configuration
                                                      files of the
                                                      currently
                                                      installed version
                                                      or do I need to
                                                       manually clear
                                                      some residues
                                                      before installing
                                                      an older version?

                                                    Thanks.

                                                _______________________________________________

                                                Gluster-users mailing
                                                list

                                                Gluster-users@xxxxxxxxxxx

                                                http://www.gluster.org/mailman/listinfo/gluster-users

                                              -- 

                                              -
                                                Atin (atinm)

                _______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

-- 

~ Atin (atinm)

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users