Re: Slow write times to gluster disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hi Guys,

I was wondering what our next steps should be to solve the slow write times.

Recently I was debugging a large code and writing a lot of output at every time step. When I tried writing to our gluster disks, it was taking over a day to do a single time step whereas if I had the same program (same hardware, network) write to our nfs disk the time per time-step was about 45 minutes. What we are shooting for here would be to have similar times to either gluster of nfs.

Thanks

Pat


On 06/02/2017 01:07 AM, Ben Turner wrote:
Are you sure using conv=sync is what you want?  I normally use conv=fdatasync, I'll look up the difference between the two and see if it affects your test.


-b

----- Original Message -----
From: "Pat Haley" <phaley@xxxxxxx>
To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx, "Steve Postma" <SPostma@xxxxxxxxxxxx>, "Ben
Turner" <bturner@xxxxxxxxxx>
Sent: Tuesday, May 30, 2017 9:40:34 PM
Subject: Re:  Slow write times to gluster disk


Hi Pranith,

The "dd" command was:

      dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync

There were 2 instances where dd reported 22 seconds. The output from the
dd tests are in

http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt

Pat

On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
Pat,
        What is the command you used? As per the following output, it
seems like at least one write operation took 16 seconds. Which is
really bad.
       96.39    1165.10 us      89.00 us*16487014.00 us*          393212
       WRITE


On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:


     Hi Pranith,

     I ran the same 'dd' test both in the gluster test volume and in
     the .glusterfs directory of each brick.  The median results (12 dd
     trials in each test) are similar to before

       * gluster test volume: 586.5 MB/s
       * bricks (in .glusterfs): 1.4 GB/s

     The profile for the gluster test-volume is in

     http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
     <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>

     Thanks

     Pat




     On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
     Let's start with the same 'dd' test we were testing with to see,
     what the numbers are. Please provide profile numbers for the
     same. From there on we will start tuning the volume to see what
     we can do.

     On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley@xxxxxxx
     <mailto:phaley@xxxxxxx>> wrote:


         Hi Pranith,

         Thanks for the tip.  We now have the gluster volume mounted
         under /home.  What tests do you recommend we run?

         Thanks

         Pat



         On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:

         On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley@xxxxxxx
         <mailto:phaley@xxxxxxx>> wrote:


             Hi Pranith,

             Sorry for the delay.  I never saw received your reply
             (but I did receive Ben Turner's follow-up to your
             reply).  So we tried to create a gluster volume under
             /home using different variations of

             gluster volume create test-volume
             mseas-data2:/home/gbrick_test_1
             mseas-data2:/home/gbrick_test_2 transport tcp

             However we keep getting errors of the form

             Wrong brick type: transport, use
             <HOSTNAME>:<export-dir-abs-path>

             Any thoughts on what we're doing wrong?


         You should give transport tcp at the beginning I think.
         Anyways, transport tcp is the default, so no need to specify
         so remove those two words from the CLI.


             Also do you have a list of the test we should be running
             once we get this volume created?  Given the time-zone
             difference it might help if we can run a small battery
             of tests and post the results rather than test-post-new
             test-post... .


         This is the first time I am doing performance analysis on
         users as far as I remember. In our team there are separate
         engineers who do these tests. Ben who replied earlier is one
         such engineer.

         Ben,
             Have any suggestions?


             Thanks

             Pat



             On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:

             On Thu, May 11, 2017 at 9:32 PM, Pat Haley
             <phaley@xxxxxxx <mailto:phaley@xxxxxxx>> wrote:


                 Hi Pranith,

                 The /home partition is mounted as ext4
                 /home ext4 defaults,usrquota,grpquota   1 2

                 The brick partitions are mounted ax xfs
                 /mnt/brick1 xfs defaults 0 0
                 /mnt/brick2 xfs defaults 0 0

                 Will this cause a problem with creating a volume
                 under /home?


             I don't think the bottleneck is disk. You can do the
             same tests you did on your new volume to confirm?


                 Pat



                 On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:

                 On Thu, May 11, 2017 at 8:57 PM, Pat Haley
                 <phaley@xxxxxxx <mailto:phaley@xxxxxxx>> wrote:


                     Hi Pranith,

                     Unfortunately, we don't have similar hardware
                     for a small scale test.  All we have is our
                     production hardware.


                 You said something about /home partition which has
                 lesser disks, we can create plain distribute
                 volume inside one of those directories. After we
                 are done, we can remove the setup. What do you say?


                     Pat




                     On 05/11/2017 07:05 AM, Pranith Kumar
                     Karampuri wrote:

                     On Thu, May 11, 2017 at 2:48 AM, Pat Haley
                     <phaley@xxxxxxx <mailto:phaley@xxxxxxx>> wrote:


                         Hi Pranith,

                         Since we are mounting the partitions as
                         the bricks, I tried the dd test writing
                         to
                         <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
                         The results without oflag=sync were 1.6
                         Gb/s (faster than gluster but not as fast
                         as I was expecting given the 1.2 Gb/s to
                         the no-gluster area w/ fewer disks).


                     Okay, then 1.6Gb/s is what we need to target
                     for, considering your volume is just
                     distribute. Is there any way you can do tests
                     on similar hardware but at a small scale?
                     Just so we can run the workload to learn more
                     about the bottlenecks in the system? We can
                     probably try to get the speed to 1.2Gb/s on
                     your /home partition you were telling me
                     yesterday. Let me know if that is something
                     you are okay to do.


                         Pat



                         On 05/10/2017 01:27 PM, Pranith Kumar
                         Karampuri wrote:

                         On Wed, May 10, 2017 at 10:15 PM, Pat
                         Haley <phaley@xxxxxxx
                         <mailto:phaley@xxxxxxx>> wrote:


                             Hi Pranith,

                             Not entirely sure (this isn't my
                             area of expertise). I'll run your
                             answer by some other people who are
                             more familiar with this.

                             I am also uncertain about how to
                             interpret the results when we also
                             add the dd tests writing to the
                             /home area (no gluster, still on the
                             same machine)

                               * dd test without oflag=sync
                                 (rough average of multiple tests)
                                   o gluster w/ fuse mount : 570 Mb/s
                                   o gluster w/ nfs mount: 390 Mb/s
                                   o nfs (no gluster):  1.2 Gb/s
                               * dd test with oflag=sync (rough
                                 average of multiple tests)
                                   o gluster w/ fuse mount:  5 Mb/s
                                   o gluster w/ nfs mount: 200 Mb/s
                                   o nfs (no gluster): 20 Mb/s

                             Given that the non-gluster area is a
                             RAID-6 of 4 disks while each brick
                             of the gluster area is a RAID-6 of
                             32 disks, I would naively expect the
                             writes to the gluster area to be
                             roughly 8x faster than to the
                             non-gluster.


                         I think a better test is to try and
                         write to a file using nfs without any
                         gluster to a location that is not inside
                         the brick but someother location that is
                         on same disk(s). If you are mounting the
                         partition as the brick, then we can
                         write to a file inside .glusterfs
                         directory, something like
                         <brick-path>/.glusterfs/<file-to-be-removed-after-test>.



                             I still think we have a speed issue,
                             I can't tell if fuse vs nfs is part
                             of the problem.


                         I got interested in the post because I
                         read that fuse speed is lesser than nfs
                         speed which is counter-intuitive to my
                         understanding. So wanted clarifications.
                         Now that I got my clarifications where
                         fuse outperformed nfs without sync, we
                         can resume testing as described above
                         and try to find what it is. Based on
                         your email-id I am guessing you are from
                         Boston and I am from Bangalore so if you
                         are okay with doing this debugging for
                         multiple days because of timezones, I
                         will be happy to help. Please be a bit
                         patient with me, I am under a release
                         crunch but I am very curious with the
                         problem you posted.

                             Was there anything useful in the
                             profiles?


                         Unfortunately profiles didn't help me
                         much, I think we are collecting the
                         profiles from an active volume, so it
                         has a lot of information that is not
                         pertaining to dd so it is difficult to
                         find the contributions of dd. So I went
                         through your post again and found
                         something I didn't pay much attention to
                         earlier i.e. oflag=sync, so did my own
                         tests on my setup with FUSE so sent that
                         reply.


                             Pat



                             On 05/10/2017 12:15 PM, Pranith
                             Kumar Karampuri wrote:
                             Okay good. At least this validates
                             my doubts. Handling O_SYNC in
                             gluster NFS and fuse is a bit
                             different.
                             When application opens a file with
                             O_SYNC on fuse mount then each
                             write syscall has to be written to
                             disk as part of the syscall where
                             as in case of NFS, there is no
                             concept of open. NFS performs write
                             though a handle saying it needs to
                             be a synchronous write, so write()
                             syscall is performed first then it
                             performs fsync(). so an write on an
                             fd with O_SYNC becomes write+fsync.
                             I am suspecting that when multiple
                             threads do this write+fsync()
                             operation on the same file,
                             multiple writes are batched
                             together to be written do disk so
                             the throughput on the disk is
                             increasing is my guess.

                             Does it answer your doubts?

                             On Wed, May 10, 2017 at 9:35 PM,
                             Pat Haley <phaley@xxxxxxx
                             <mailto:phaley@xxxxxxx>> wrote:


                                 Without the oflag=sync and only
                                 a single test of each, the FUSE
                                 is going faster than NFS:

                                 FUSE:
                                 mseas-data2(dri_nascar)% dd
                                 if=/dev/zero count=4096
                                 bs=1048576 of=zeros.txt conv=sync
                                 4096+0 records in
                                 4096+0 records out
                                 4294967296 bytes (4.3 GB)
                                 copied, 7.46961 s, 575 MB/s


                                 NFS
                                 mseas-data2(HYCOM)% dd
                                 if=/dev/zero count=4096
                                 bs=1048576 of=zeros.txt conv=sync
                                 4096+0 records in
                                 4096+0 records out
                                 4294967296 bytes (4.3 GB)
                                 copied, 11.4264 s, 376 MB/s



                                 On 05/10/2017 11:53 AM, Pranith
                                 Kumar Karampuri wrote:
                                 Could you let me know the
                                 speed without oflag=sync on
                                 both the mounts? No need to
                                 collect profiles.

                                 On Wed, May 10, 2017 at 9:17
                                 PM, Pat Haley <phaley@xxxxxxx
                                 <mailto:phaley@xxxxxxx>> wrote:


                                     Here is what I see now:

                                     [root@mseas-data2 ~]#
                                     gluster volume info

                                     Volume Name: data-volume
                                     Type: Distribute
                                     Volume ID:
                                     c162161e-2a2d-4dac-b015-f31fd89ceb18
                                     Status: Started
                                     Number of Bricks: 2
                                     Transport-type: tcp
                                     Bricks:
                                     Brick1:
                                     mseas-data2:/mnt/brick1
                                     Brick2:
                                     mseas-data2:/mnt/brick2
                                     Options Reconfigured:
                                     diagnostics.count-fop-hits: on
                                     diagnostics.latency-measurement:
                                     on
                                     nfs.exports-auth-enable: on
                                     diagnostics.brick-sys-log-level:
                                     WARNING
                                     performance.readdir-ahead: on
                                     nfs.disable: on
                                     nfs.export-volumes: off



                                     On 05/10/2017 11:44 AM,
                                     Pranith Kumar Karampuri wrote:
                                     Is this the volume info
                                     you have?

                                     >/[root at mseas-data2
                                     <http://www.gluster.org/mailman/listinfo/gluster-users>
                                     ~]# gluster volume info
                                     />//>/Volume Name:
                                     data-volume />/Type:
                                     Distribute />/Volume ID:
                                     c162161e-2a2d-4dac-b015-f31fd89ceb18
                                     />/Status: Started />/Number
                                     of Bricks: 2
                                     />/Transport-type: tcp
                                     />/Bricks: />/Brick1:
                                     mseas-data2:/mnt/brick1
                                     />/Brick2:
                                     mseas-data2:/mnt/brick2
                                     />/Options Reconfigured:
                                     />/performance.readdir-ahead:
                                     on />/nfs.disable: on
                                     />/nfs.export-volumes: off /
                                     ​I copied this from old
                                     thread from 2016. This is
                                     distribute volume. Did
                                     you change any of the
                                     options in between?
                                     --

                                     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                                     Pat Haley
                                     Email:phaley@xxxxxxx
                                     <mailto:phaley@xxxxxxx>
                                     Center for Ocean Engineering
                                     Phone:  (617) 253-6824
                                     Dept. of Mechanical Engineering
                                     Fax:    (617) 253-8125
                                     MIT, Room
                                     5-213http://web.mit.edu/phaley/www/
                                     77 Massachusetts Avenue
                                     Cambridge, MA  02139-4301

                                 --
                                 Pranith
                                 --

                                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                                 Pat Haley
                                 Email:phaley@xxxxxxx
                                 <mailto:phaley@xxxxxxx>
                                 Center for Ocean Engineering
                                 Phone:  (617) 253-6824
                                 Dept. of Mechanical Engineering
                                 Fax:    (617) 253-8125
                                 MIT, Room
                                 5-213http://web.mit.edu/phaley/www/
                                 77 Massachusetts Avenue
                                 Cambridge, MA  02139-4301

                             --
                             Pranith
                             --

                             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                             Pat Haley
                             Email:phaley@xxxxxxx
                             <mailto:phaley@xxxxxxx>
                             Center for Ocean Engineering       Phone:
                             (617) 253-6824
                             Dept. of Mechanical Engineering    Fax:
                             (617) 253-8125
                             MIT, Room
                             5-213http://web.mit.edu/phaley/www/
                             77 Massachusetts Avenue
                             Cambridge, MA  02139-4301

                         --
                         Pranith
                         --

                         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                         Pat Haley
                         Email:phaley@xxxxxxx
                         <mailto:phaley@xxxxxxx>
                         Center for Ocean Engineering       Phone:
                         (617) 253-6824
                         Dept. of Mechanical Engineering    Fax:
                         (617) 253-8125
                         MIT, Room 5-213http://web.mit.edu/phaley/www/
                         77 Massachusetts Avenue
                         Cambridge, MA  02139-4301

                     --
                     Pranith
                     --

                     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                     Pat Haley
                     Email:phaley@xxxxxxx
                     <mailto:phaley@xxxxxxx>
                     Center for Ocean Engineering       Phone:  (617)
                     253-6824
                     Dept. of Mechanical Engineering    Fax:    (617)
                     253-8125
                     MIT, Room 5-213http://web.mit.edu/phaley/www/
                     77 Massachusetts Avenue
                     Cambridge, MA  02139-4301

                 --
                 Pranith
                 --

                 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                 Pat Haley                          Email:phaley@xxxxxxx
                 <mailto:phaley@xxxxxxx>
                 Center for Ocean Engineering       Phone:  (617)
                 253-6824
                 Dept. of Mechanical Engineering    Fax:    (617)
                 253-8125
                 MIT, Room 5-213http://web.mit.edu/phaley/www/
                 77 Massachusetts Avenue
                 Cambridge, MA  02139-4301

             --
             Pranith
             --

             -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
             Pat Haley                          Email:phaley@xxxxxxx
             <mailto:phaley@xxxxxxx>
             Center for Ocean Engineering       Phone:  (617) 253-6824
             Dept. of Mechanical Engineering    Fax:    (617) 253-8125
             MIT, Room 5-213http://web.mit.edu/phaley/www/
             77 Massachusetts Avenue
             Cambridge, MA  02139-4301




         --
         Pranith
         --

         -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
         Pat Haley                          Email:phaley@xxxxxxx
         <mailto:phaley@xxxxxxx>
         Center for Ocean Engineering       Phone:  (617) 253-6824
         Dept. of Mechanical Engineering    Fax:    (617) 253-8125
         MIT, Room 5-213http://web.mit.edu/phaley/www/
         77 Massachusetts Avenue
         Cambridge, MA  02139-4301




     --
     Pranith
     --

     -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
     Pat Haley                          Email:phaley@xxxxxxx
     <mailto:phaley@xxxxxxx>
     Center for Ocean Engineering       Phone:  (617) 253-6824
     Dept. of Mechanical Engineering    Fax:    (617) 253-8125
     MIT, Room 5-213http://web.mit.edu/phaley/www/
     77 Massachusetts Avenue
     Cambridge, MA  02139-4301




--
Pranith
--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301



--

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley                          Email:  phaley@xxxxxxx
Center for Ocean Engineering       Phone:  (617) 253-6824
Dept. of Mechanical Engineering    Fax:    (617) 253-8125
MIT, Room 5-213                    http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA  02139-4301

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux