Re: GlusterFS 3.5.3 - untar: very poor performance

Geoffrey Letessier <geoffrey.letessier@xxxxxxx> · Mon, 29 Jun 2015 21:40:52 +0200

Hello Vijay,
I’m really sorry to bother you but the situation is really critical for our research jobs. Indeed, since this morning, due to previously described situation, we’ve decided to stop all the production and data access until your script fix the problem.

After having reboot our storage cluster this morning (french time), no more crazy processes, CPU usage is back to the normal and quotas dont seem to grow up (but still contain big errors: > 1TB nay much more); but several quotas are no longer computed (since a couple of hours) as you can read below:
[root@lucifer ~]# gluster volume quota vol_home list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/derreumaux_team                          11.0TB       80%      0Bytes  11.0TB
/baaden_team                              20.0TB       80%      15.1TB   4.9TB
/sterpone_team                            14.0TB       80%      0Bytes  14.0TB
/amyloid_team                              7.0TB       80%       6.4TB 577.5GB
/amyloid_team/nguyen                       4.0TB       80%       3.7TB 312.7GB
/sacquin_team                             10.0TB       80%      0Bytes  10.0TB
/simlab_team                               5.0TB       80%       1.3TB   3.7TB

I dont know your operational hours in India but i think the end-of-day is over, right?  I’m really sorry to stress you but we are currently completely under pressure because it’s not a good period to stop the scientific computation and production.

Thanks by advance for your script and for your help. Can I do something accelerate the script development process (coding it myself or something like that)?

Nice evening (or night).
Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 29 juin 2015 à 08:43, Vijaikumar M <vmallika@xxxxxxxxxx> a écrit :

    On Sunday 28 June 2015 01:34 PM,
      Geoffrey Letessier wrote:

      Hello,

      @Krutika: Thanks for transferring my issue.

      Everything is becoming completely crazy; other quotas are
        exploding. Indeed, after having remove my previous quota in
        failure, some other quotas have grown up as you can read below:

        [root@lucifer
          ~]# gluster volume quota vol_home list

                          Path                   Hard-limit Soft-limit  
          Used  Available
        --------------------------------------------------------------------------------
        /baaden_team 
                                      20.0TB       90%      15.1TB  
          4.9TB
        /sterpone_team 
                                    14.0TB       90%      25.5TB  0Bytes
        /simlab_team
                                        5.0TB       90%       1.3TB  
          3.7TB
        /sacquin_team
                                      10.0TB       90%       8.3TB  
          1.7TB
        /admin_team 
                                        1.0TB       90%      17.0GB
          1007.0GB
        /amyloid_team 
                                      7.0TB       90%       6.4TB
          577.5GB
        /amyloid_team/nguyen
                                4.0TB       90%       3.7TB 312.7GB

          [root@lucifer
            ~]# pdsh -w cl-storage[1,3] du -sh
            /export/brick_home/brick*/sterpone_team
          cl-storage1:
            3,1T /export/brick_home/brick1/sterpone_team
          cl-storage1:
            2,3T /export/brick_home/brick2/sterpone_team
          cl-storage3:
            2,7T /export/brick_home/brick1/sterpone_team
          cl-storage3:
            2,9T /export/brick_home/brick2/sterpone_team

        => ~11TB (not 25.5TB!!!)

          [root@lucifer
            ~]# pdsh -w cl-storage[1,3] du -sh
            /export/brick_home/brick*/baaden_team
          cl-storage1:
            4,2T /export/brick_home/brick1/baaden_team
          cl-storage3:
            3,7T /export/brick_home/brick1/baaden_team
          cl-storage1:
            3,6T /export/brick_home/brick2/baaden_team
          cl-storage3:
            3,5T /export/brick_home/brick2/baaden_team

        => ~15TB (not 14TB).

        Etc.

        Do you please help me to urgently solve this issue because
          this situation is blocking and I must stop the production
          until.

        Do you think upgrading storage cluster into 3.7.1 (the
          latest) version of GlusterFS could fix the problem?

      WE need to manually fix this issue. We need to find what are
      directory whose quota size is miscalculated and need to fix the
      meta-data in the brick. We are writing an automated
      script for fixing this issue and will provide the script by eod
      IST time

    Thanks,

    Vijay

        Thanks by advance,
        Geoffrey

            ------------------------------------------------------

            Geoffrey Letessier

            Responsable informatique & ingénieur système

            UPR 9080 - CNRS - Laboratoire de Biochimie Théorique

            Institut de Biologie Physico-Chimique

            13, rue Pierre et Marie Curie - 75005 Paris

            Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

            Le 27 juin 2015 à 08:13, Krutika Dhananjay <kdhananj@xxxxxxxxxx>
              a écrit :

                Copying Vijai and Raghavendra for help...

                -Krutika

                From:
                  "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>

                  To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>

                  Sent: Saturday, June 27, 2015 2:13:52 AM

                  Subject: Re:  GlusterFS 3.5.3 -
                  untar: very poor performance

                  Hi Krutika,

                  Since I have re-enabled the quota feature on my
                    volume vol_home, one defined quota is become like
                    crazy… And it’s a very very very big problem for us.

                  During all the day, after having re-enabled it, i
                    noted the used space detected growing up (without
                    any user IO on)..

                    [root@lucifer ~]#
                      gluster volume quota vol_home list|grep
                      derreumaux_team
                    /derreumaux_team     
                                          14.0TB       80%      13.7TB
                      357.2GB
                    [root@lucifer ~]#
                      gluster volume quota vol_home list
                      /derreumaux_team
                                      Path
                                        Hard-limit Soft-limit   Used 
                      Available
                    --------------------------------------------------------------------------------
                    /derreumaux_team     
                                          14.0TB       80%      13.1TB
                      874.1GB
                    [root@lucifer ~]# pdsh
                      -w cl-storage[1,3] du -sh
                      /export/brick_home/brick*/derreumaux_team
                    cl-storage3: 590G /export/brick_home/brick1/derreumaux_team
                    cl-storage3: 611G /export/brick_home/brick2/derreumaux_team
                    cl-storage1: 567G /export/brick_home/brick1/derreumaux_team
                    cl-storage1: 564G /export/brick_home/brick2/derreumaux_team

                    As you can see in these 3 command lines, i
                      obtain 3 different results but, the worse, it’s
                      quota system est very very far from the real disk
                      used space (13.7TB <> 13.1TB
                      <<>> 2.3TB).

                    Can you please help to fix it very quickly
                      because all this group is completely block by
                      exceeded quota.

                    Thank you so much by advance,
                    Have a nice week-end,
                    Geoffrey
                    ------------------------------------------------------

                      Geoffrey Letessier

                      Responsable informatique & ingénieur système

                      UPR 9080 - CNRS - Laboratoire de Biochimie
                      Théorique

                      Institut de Biologie Physico-Chimique

                      13, rue Pierre et Marie Curie - 75005 Paris

                      Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                      Le 26 juin 2015 à 10:29, Krutika Dhananjay
                        <kdhananj@xxxxxxxxxx>
                        a écrit :

                          No but if you are saying it is 3.5.3 rpm
                            version, then that bug does not exist there.

                          And still it is weird how you are seeing
                            such bad performance. :-/

                          Anything suspicious in the logs?

                          -Krutika

                          From: "Geoffrey Letessier"
                            <geoffrey.letessier@xxxxxxx>

                            To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>

                            Sent: Friday, June 26, 2015 1:27:16
                            PM

                            Subject: Re:
                            GlusterFS 3.5.3 - untar: very poor
                            performance

                            No , it’s the 3.5.3 RPMS version if found on
                            your reposity (published on novembre 2014).
                            So, you suggest me to simply
                              upgrade all servers and clients with the
                              new 3.5.4 version? Wouldn't it be better
                              to upgrade all the system (servers and
                              clients) to the 3.7.1 version?

                            Geoffrey

                                ------------------------------------------------------

                                  Geoffrey Letessier

                                  Responsable informatique &
                                  ingénieur système

                                  UPR 9080 - CNRS - Laboratoire de
                                  Biochimie Théorique

                                  Institut de Biologie Physico-Chimique

                                  13, rue Pierre et Marie Curie - 75005
                                  Paris

                                  Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

                                  Le 26 juin 2015 à 09:03,
                                    Krutika Dhananjay <kdhananj@xxxxxxxxxx>
                                    a écrit :

                                        Also, so are you
                                          running 3.5.3 rpms on the
                                          clients? Or is it a patched
                                          version with more fixes on top
                                          of 3.5.3?

                                        The reason I ask
                                          this is because there was one
                                          performance issue introduced
                                          after 3.5.3 and fixed by 3.5.4
                                          in replication module. I'm
                                          wondering if that could be
                                          causing the issue you
                                          experience.

                                        -Krutika

                                        From: "Geoffrey
                                          Letessier" <geoffrey.letessier@xxxxxxx>

                                          To: "Krutika
                                          Dhananjay" <kdhananj@xxxxxxxxxx>

                                          Sent: Friday,
                                          June 26, 2015 10:05:26 AM

                                          Subject: Re:
                                           GlusterFS
                                          3.5.3 - untar: very poor
                                          performance

                                          Hi Krutika,

                                          Oops, I disable
                                            quota manager without saving
                                            configuration. Could you
                                            tell me how to retrieve
                                            quota list information?

                                          I’m gonna test
                                            the untar in the meantime.

                                          Geoffrey

                                            ------------------------------------------------------

                                              Geoffrey Letessier

                                              Responsable informatique
                                              & ingénieur système

                                              UPR 9080 - CNRS -
                                              Laboratoire de Biochimie
                                              Théorique

                                              Institut de Biologie
                                              Physico-Chimique

                                              13, rue Pierre et Marie
                                              Curie - 75005 Paris

                                              Tel: 01 58 41 50 93 -
                                              eMail: geoffrey.letessier@xxxxxxx

                                              Le 26 juin
                                                2015 à 04:56, Krutika
                                                Dhananjay <kdhananj@xxxxxxxxxx>
                                                a écrit :

                                                  Hi,

                                                  So i
                                                    tried out kernel src
                                                    tree untar locally
                                                    on a plain replicate
                                                    (1x2) volume and it
                                                    took me 7m30sec on
                                                    an average. This was
                                                    on vms and there was
                                                    no rdma and there
                                                    was no quota
                                                    enabled.
                                                  Could
                                                    you try the same
                                                    thing on a volume
                                                    without quota to see
                                                    if it makes a
                                                    difference to the
                                                    perf?

                                                  -Krutika

                                                  From: "Geoffrey
                                                    Letessier" <geoffrey.letessier@xxxxxxx>

                                                    To: "Krutika
                                                    Dhananjay" <kdhananj@xxxxxxxxxx>

                                                    Sent: Wednesday,
                                                    June 24, 2015
                                                    10:21:13 AM

                                                    Subject:
                                                    Re:

                                                    GlusterFS 3.5.3 -
                                                    untar: very poor
                                                    performance

                                                    Hi Krutika,

                                                    OK,
                                                      thank you very
                                                      much by advance.
                                                    Concerning
                                                      quota system, are
                                                      you in touch with
                                                      Vijaykumar?
                                                      Because I’m still
                                                      waiting for a
                                                      answer since a
                                                      couple of days,
                                                      nay more.

                                                    One
                                                      more time, thank
                                                      you.
                                                    Have a
                                                      nice day (in
                                                      France it’s 6:50
                                                      AM).
                                                    Geoffrey

                                                        -----------------------------------------------

                                                          Geoffrey
                                                          Letessier

                                                          Responsable
                                                          informatique
                                                          &
                                                          ingénieur
                                                          système

                                                          CNRS - UPR
                                                          9080 -
                                                          Laboratoire
                                                          de Biochimie
                                                          Théorique

                                                          Institut de
                                                          Biologie
                                                          Physico-Chimique

                                                          13, rue Pierre
                                                          et Marie Curie
                                                          - 75005 Paris

                                                          Tel: 01 58 41
                                                          50 93 -
                                                          eMail: geoffrey.letessier@xxxxxxx

                                                        Le
                                                          24 juin 2015 à
                                                          05:55, Krutika
                                                          Dhananjay <kdhananj@xxxxxxxxxx>
                                                          a écrit :

                                                          Ok
                                                          so for
                                                          anything
                                                          related to
                                                          replication, I
                                                          could help you
                                                          out.
                                                          But
                                                          for quota, it
                                                          would be
                                                          better to ask
                                                          Vijaikumar
                                                          Mallikarjuna
                                                          or Raghavendra
                                                          G on the
                                                          mailing list.

                                                          I
                                                          used to work
                                                          on quota, long
                                                          time back. But
                                                          now I am not
                                                          in touch with
                                                          the component
                                                          anymore and do
                                                          not know of
                                                          the latest
                                                          changes to it.

                                                          For
                                                          the
                                                          performance
                                                          issue, I will
                                                          try linux
                                                          kernel src
                                                          untar on my
                                                          machines and
                                                          let you know
                                                          what I find.

                                                          -Krutika

                                                          From:
                                                          "Geoffrey
                                                          Letessier"
                                                          <geoffrey.letessier@xxxxxxx>

                                                          To:
                                                          "Krutika
                                                          Dhananjay"
                                                          <kdhananj@xxxxxxxxxx>

                                                          Sent:
                                                          Monday,
                                                          June 22, 2015
                                                          9:00:52 PM

                                                          Subject:
                                                          Re:

                                                          GlusterFS
                                                          3.5.3 - untar:
                                                          very poor
                                                          performance

                                                          Hi Krutika,

                                                          Sorry
                                                          for the delay
                                                          but i was in
                                                          meeting all
                                                          the day. 

                                                          Good
                                                          to hear from
                                                          you as well.
                                                          :)

                                                          ;-)

                                                          So
                                                          you are seeing
                                                          this bad
                                                          performance
                                                          only in 3.5.3?
                                                          Any other
                                                          releases you
                                                          tried this
                                                          test on, where
                                                          the results
                                                          were much
                                                          better with
                                                          replication?

                                                          Yes
                                                          but I’m not
                                                          sure my issue
                                                          is only
                                                          concerning
                                                          this specific
                                                          release. A few
                                                          days ago, the
                                                          untar process
                                                          (with the same
                                                          version of
                                                          GlusterFS)
                                                          took around 8
                                                          minutes, now
                                                          around 32
                                                          minutes. 8 was
                                                          too much but
                                                          what about 32
                                                          minutes? :)

                                                          That
                                                          said, my
                                                          matter is only
                                                          concerning
                                                          small files
                                                          because if i
                                                          play with dd
                                                          (or other)
                                                          with only 1
                                                          big file all
                                                          is OK (client
                                                          write
                                                          throughput:
                                                          ~1GBs =>
                                                          ~500MBs in
                                                          each replica)

                                                          If
                                                          i run my bench
                                                          on my only
                                                          distributed
                                                          volume i get a
                                                          good
                                                          performance
                                                          (untar:
                                                          ~1m44s,
                                                          etc.)..

                                                          In
                                                          addition, i
                                                          dunno if it
                                                          can be
                                                          important, I
                                                          have some
                                                          troubles with
                                                          GlusterFS
                                                          group quota:
                                                          there are a
                                                          lot of
                                                          conflicts
                                                          between quota
                                                          size and
                                                          actual file
                                                          size which
                                                          dont match and
                                                          a lot of
                                                          "quota xattrs
                                                          not found"
                                                          messages with
                                                          quota-verify
                                                          glusterfs app.
                                                          -you can find
                                                          in attachment
                                                          an extract of
                                                          quota-verify
                                                          outputs. 

                                                          If
                                                          so, could you
                                                          please let me
                                                          know?
                                                          Meanwhile let
                                                          me try the
                                                          untar myself
                                                          on my vms to
                                                          see what could
                                                          be causing the
                                                          perf issue.

                                                          OK,
                                                          thanks. 

                                                          See
                                                          you,
                                                          Geoffrey

                                                          ------------------------------------------------------

                                                          Geoffrey
                                                          Letessier

                                                          Responsable
                                                          informatique &
                                                          ingénieur
                                                          système

                                                          UPR 9080 -
                                                          CNRS -
                                                          Laboratoire de
Biochimie Théorique

                                                          Institut de
                                                          Biologie
                                                          Physico-Chimique

                                                          13, rue Pierre
                                                          et Marie Curie
                                                          - 75005 Paris

                                                          Tel: 01 58 41
                                                          50 93 -
                                                          eMail: geoffrey.letessier@xxxxxxx

                                                          Le
                                                          22 juin 2015 à
                                                          11:35, Krutika
                                                          Dhananjay <kdhananj@xxxxxxxxxx> a
                                                          écrit :

                                                          Hi
                                                          Geoffrey,

                                                          Good
                                                          to hear from
                                                          you as well.
                                                          :)

                                                          Ok
                                                          so you say
                                                          disabling
                                                          write-behind
                                                          does not help.
                                                          Makes me
                                                          wonder what
                                                          the problem
                                                          could be.

                                                          So
                                                          you are seeing
                                                          this bad
                                                          performance
                                                          only in 3.5.3?
                                                          Any other
                                                          releases you
                                                          tried this
                                                          test on, where
                                                          the results
                                                          were much
                                                          better with
                                                          replication?

                                                          If
                                                          so, could you
                                                          please let me
                                                          know?
                                                          Meanwhile let
                                                          me try the
                                                          untar myself
                                                          on my vms to
                                                          see what could
                                                          be causing the
                                                          perf issue.

                                                          -Krutika

                                                          From:
                                                          "Geoffrey
                                                          Letessier"
                                                          <geoffrey.letessier@xxxxxxx>

                                                          To:
                                                          "Krutika
                                                          Dhananjay"
                                                          <kdhananj@xxxxxxxxxx>

                                                          Sent:
                                                          Monday,
                                                          June 22, 2015
                                                          10:14:26 AM

                                                          Subject:
                                                          Re:

                                                          GlusterFS
                                                          3.5.3 - untar:
                                                          very poor
                                                          performance

                                                          Hi Krutika,

                                                          It’s
                                                          good to read
                                                          you again :)

                                                          Here
                                                          are my
                                                          answers:
                                                          1-
                                                          could you
                                                          remind me how
                                                          to know if
                                                          self-heal is
                                                          currently in
                                                          progress? I
                                                          dont note any
                                                          special
                                                          neither
                                                          mount-point
                                                          (except
                                                          /var/run/gluster/vol_home
                                                          one) nor
                                                          dedicated
                                                          process; but
                                                          maybe i look
                                                          in the wrong
                                                          place..
                                                          2-
                                                          OK, I just
                                                          disabled
                                                          write-behind
                                                          parameter and
                                                          rerun the
                                                          bench. I’ll
                                                          let you know
                                                          more about
                                                          when I will
                                                          arrive at my
                                                          office (I’m
                                                          still at home
                                                          at this time).

                                                          See
                                                          you and thanks
                                                          you for
                                                          helping. 
                                                          Geoffrey

                                                          -----------------------------------------------

                                                          Geoffrey
                                                          Letessier

                                                          Responsable
                                                          informatique
                                                          &
                                                          ingénieur
                                                          système

                                                          CNRS - UPR
                                                          9080 -
                                                          Laboratoire
                                                          de Biochimie
                                                          Théorique

                                                          Institut de
                                                          Biologie
                                                          Physico-Chimique

                                                          13, rue Pierre
                                                          et Marie Curie
                                                          - 75005 Paris

                                                          Tel: 01 58 41
                                                          50 93 -
                                                          eMail: geoffrey.letessier@xxxxxxx

                                                          Le
                                                          22 juin 2015 à
                                                          04:35, Krutika
                                                          Dhananjay <kdhananj@xxxxxxxxxx>
                                                          a écrit :

                                                          Hi
                                                          Geoffrey,

                                                          1.
                                                          Was self-heal
                                                          also in
                                                          progress while
                                                          I/O was
                                                          happening on
                                                          the volume?

                                                          2.
                                                          Also, there
                                                          seem to be
                                                          quite a few
                                                          fsyncs which
                                                          could possibly
                                                          have slowed
                                                          things down a
                                                          bit. Could you
                                                          disable
                                                          write-behind
                                                          and try

                                                          getting the
                                                          time stats one
                                                          more time to
                                                          eliminate the
                                                          possibility of
                                                          write-behind's
                                                          presence
                                                          causing
                                                          out-of-order
                                                          writes to
                                                          increase the
                                                          number of
                                                          fsyncs

                                                          by the
                                                          replication
                                                          module.

                                                          -Krutika

                                                          From:
                                                          "Geoffrey
                                                          Letessier"
                                                          <geoffrey.letessier@xxxxxxx>

                                                          To:
                                                          gluster-users@xxxxxxxxxxx

                                                          Sent:
                                                          Saturday,
                                                          June 20, 2015
                                                          6:04:40 AM

                                                          Subject:
                                                          Re:

                                                          GlusterFS
                                                          3.5.3 - untar:
                                                          very poor
                                                          performance

                                                          Re,

                                                          For
                                                          comparison,
                                                          here is the
                                                          output of the
                                                          same script
                                                          run on a
                                                          distributed
                                                          only volume (2
                                                          servers of the
                                                          4 previously
                                                          described, 2
                                                          bricks each):

                                                          #######################################################
                                                          ################ 
                                                          UNTAR time
                                                          consumed 
                                                          ################
                                                          #######################################################

                                                          real 1m44.698s
                                                          user 0m8.891s
                                                          sys 0m8.353s

                                                          #######################################################
                                                          ################# 
                                                          DU time
                                                          consumed 
                                                          ##################
                                                          #######################################################

                                                          554M linux-4.1-rc6

                                                          real 0m21.062s
                                                          user 0m0.100s
                                                          sys 0m1.040s

                                                          #######################################################
                                                          ################# 
                                                          FIND time
                                                          consumed 
                                                          ################
                                                          #######################################################

                                                          52663

                                                          real 0m21.325s
                                                          user 0m0.104s
                                                          sys 0m1.054s

                                                          #######################################################
                                                          ################# 
                                                          GREP time
                                                          consumed 
                                                          ################
                                                          #######################################################

                                                          7952

                                                          real 0m43.618s
                                                          user 0m0.922s
                                                          sys 0m3.626s

                                                          #######################################################
                                                          ################# 
                                                          TAR time
                                                          consumed 
                                                          #################
                                                          #######################################################

                                                          real 0m50.577s
                                                          user 0m29.745s
                                                          sys 0m4.086s

                                                          #######################################################
                                                          ################# 
                                                          RM time
                                                          consumed 
                                                          ##################
                                                          #######################################################

                                                          real 0m41.133s
                                                          user 0m0.171s
                                                          sys 0m2.522s

                                                          The
                                                          performances
                                                          are amazing
                                                          different!

                                                          Geoffrey

                                                          -----------------------------------------------

                                                          Geoffrey
                                                          Letessier

                                                          Responsable
                                                          informatique
                                                          &
                                                          ingénieur
                                                          système

                                                          CNRS - UPR
                                                          9080 -
                                                          Laboratoire
                                                          de Biochimie
                                                          Théorique

                                                          Institut de
                                                          Biologie
                                                          Physico-Chimique

                                                          13, rue Pierre
                                                          et Marie Curie
                                                          - 75005 Paris

                                                          Tel: 01 58 41
                                                          50 93 -
                                                          eMail: geoffrey.letessier@xxxxxxx

                                                          Le
                                                          20 juin 2015 à
                                                          02:12,
                                                          Geoffrey
                                                          Letessier <geoffrey.letessier@xxxxxxx>
                                                          a écrit :

                                                          Dear all,

                                                          I
                                                          just noticed
                                                          on my main
                                                          volume of my
                                                          HPC cluster my
                                                          IO operations
                                                          become
                                                          impressively
                                                          poor.. 

                                                          Doing
                                                          some file
                                                          operations
                                                          above a linux
                                                          kernel sources
                                                          compressed
                                                          file, the
                                                          untar
                                                          operation can
                                                          take more than
                                                          1/2 hours for
                                                          this file
                                                          (roughly 80MB
                                                          and 52 000
                                                          files inside)
                                                          as you read
                                                          below:

                                                          #######################################################
                                                          ################ 
                                                          UNTAR time
                                                          consumed 
                                                          ################
                                                          #######################################################

                                                          real 32m42.967s
                                                          user 0m11.783s
                                                          sys 0m15.050s

                                                          #######################################################
                                                          ################# 
                                                          DU time
                                                          consumed 
                                                          ##################
                                                          #######################################################

                                                          557M linux-4.1-rc6

                                                          real 0m25.060s
                                                          user 0m0.068s
                                                          sys 0m0.344s

                                                          #######################################################
                                                          ################# 
                                                          FIND time
                                                          consumed 
                                                          ################
                                                          #######################################################

                                                          52663

                                                          real 0m25.687s
                                                          user 0m0.084s
                                                          sys 0m0.387s

                                                          #######################################################
                                                          ################# 
                                                          GREP time
                                                          consumed 
                                                          ################
                                                          #######################################################

                                                          7952

                                                          real 2m15.890s
                                                          user 0m0.887s
                                                          sys 0m2.777s

                                                          #######################################################
                                                          ################# 
                                                          TAR time
                                                          consumed 
                                                          #################
                                                          #######################################################

                                                          real 1m5.551s
                                                          user 0m26.536s
                                                          sys 0m2.609s

                                                          #######################################################
                                                          ################# 
                                                          RM time
                                                          consumed 
                                                          ##################
                                                          #######################################################

                                                          real 2m51.485s
                                                          user 0m0.167s
                                                          sys 0m1.663s

                                                          For
                                                          information,
                                                          this volume is
                                                          a distributed
                                                          replicated one
                                                          and is
                                                          composed by 4
                                                          servers with 2
                                                          bricks each.
                                                          Each bricks is
                                                          a 12-drives
                                                          RAID6 vdisk
                                                          with nice
                                                          native
                                                          performances
                                                          (around
                                                          1.2GBs).

                                                          In
                                                          comparison,
                                                          when I use DD
                                                          to generate a
                                                          100GB file on
                                                          the same
                                                          volume, my
                                                          write
                                                          throughput is
                                                          around 1GB
                                                          (client side)
                                                          and 500MBs
                                                          (server side)
                                                          because of
                                                          replication:
                                                          Client
                                                          side:

                                                          [root@node056
                                                          ~]# ifstat -i
                                                          ib0

                                                          ib0        
                                                           KB/s
                                                          in  KB/s out
                                                           3251.45 
                                                          1.09e+06
                                                           3139.80 
                                                          1.05e+06
                                                           3185.29 
                                                          1.06e+06
                                                           3293.84 
                                                          1.09e+06

                                                          ...

                                                          Server
                                                          side:

                                                          [root@lucifer
                                                          ~]# ifstat -i
                                                          ib0

                                                            ib0        
                                                           KB/s
                                                          in  KB/s out
                                                          561818.1   1746.42
                                                          560020.3   1737.92
                                                          526337.1   1648.20
                                                          513972.7   1613.69

                                                          ...

                                                          DD
                                                          command:

                                                          [root@node056
                                                          ~]# dd
                                                          if=/dev/zero
                                                          of=/home/root/test.dd
                                                          bs=1M
                                                          count=100000
                                                          100000+0
                                                          enregistrements
                                                          lus
                                                          100000+0
                                                          enregistrements
                                                          écrits
                                                          104857600000
                                                          octets (105
                                                          GB) copiés,
                                                          202,99 s, 517
                                                          MB/s

                                                          So
                                                          this issue
                                                          doesn’t seem
                                                          coming from
                                                          the network
                                                          (which is
                                                          Infiniband
                                                          technology in
                                                          this case)

                                                          You
                                                          can find in
                                                          attachments a
                                                          set of files:
                                                           - mybench.sh:
                                                          the bench
                                                          script
                                                           - benches.txt:
                                                          output of my
                                                          "bench"
                                                           - profile.txt:
                                                          gluster volume
                                                          profile during
                                                          the "bench"
                                                           - vol_status.txt:
                                                          gluster volume
                                                          status
                                                           - vol_info.txt:
                                                          gluster volume
                                                          info

                                                          Can
                                                          someone help
                                                          me to fix it
                                                          (it’s very
                                                          critical
                                                          because this
                                                          volume is on a
                                                          HPC cluster in
                                                          production).

                                                          Thanks
                                                          by advance,
                                                          Geoffrey

                                                          -----------------------------------------------

                                                          Geoffrey
                                                          Letessier

                                                          Responsable
                                                          informatique
                                                          &
                                                          ingénieur
                                                          système

                                                          CNRS - UPR
                                                          9080 -
                                                          Laboratoire
                                                          de Biochimie
                                                          Théorique

                                                          Institut de
                                                          Biologie
                                                          Physico-Chimique

                                                          13, rue Pierre
                                                          et Marie Curie
                                                          - 75005 Paris

                                                          Tel: 01 58 41
                                                          50 93 -
                                                          eMail: geoffrey.letessier@xxxxxxx

                                                          <benches.txt>

                                                          <mybench.sh>

                                                          <profile.txt>

                                                          <vol_info.txt>

                                                          <vol_status.txt>

_______________________________________________

                                                          Gluster-users
                                                          mailing list

                                                          Gluster-users@xxxxxxxxxxx

                                                          http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users