Re: Slow write times to gluster disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 27, 2017 at 10:17:40AM +0530, Pranith Kumar Karampuri wrote:
> On Mon, Jun 26, 2017 at 7:40 PM, Pat Haley <phaley@xxxxxxx> wrote:
> 
> >
> > Hi All,
> >
> > Decided to try another tests of gluster mounted via FUSE vs gluster
> > mounted via NFS, this time using the software we run in production (i.e.
> > our ocean model writing a netCDF file).
> >
> > gluster mounted via NFS the run took 2.3 hr
> >
> > gluster mounted via FUSE: the run took 44.2 hr
> >
> > The only problem with using gluster mounted via NFS is that it does not
> > respect the group write permissions which we need.
> >
> > We have an exercise coming up in the a couple of weeks.  It seems to me
> > that in order to improve our write times before then, it would be good to
> > solve the group write permissions for gluster mounted via NFS now.  We can
> > then revisit gluster mounted via FUSE afterwards.
> >
> > What information would you need to help us force gluster mounted via NFS
> > to respect the group write permissions?
> >
> 
> +Niels, +Jiffin
> 
> I added 2 more guys who work on NFS to check why this problem happens in
> your environment. Let's see what information they may need to find the
> problem and solve this issue.

Hi Pat,

depending on the number of groups that a user is part of, you may need
to change some volume options. A complete description of the limitations
on the number of groups can be foune here:

https://github.com/gluster/glusterdocs/blob/master/Administrator%20Guide/Handling-of-users-with-many-groups.md

HTH,
Niels


> 
> 
> >
> > Thanks
> >
> > Pat
> >
> >
> >
> >
> > On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote:
> >
> >
> >
> > On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri <
> > pkarampu@xxxxxxxxxx> wrote:
> >
> >>
> >>
> >> On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley@xxxxxxx> wrote:
> >>
> >>>
> >>> Hi,
> >>>
> >>> Today we experimented with some of the FUSE options that we found in the
> >>> list.
> >>>
> >>> Changing these options had no effect:
> >>>
> >>> gluster volume set test-volume performance.cache-max-file-size 2MB
> >>> gluster volume set test-volume performance.cache-refresh-timeout 4
> >>> gluster volume set test-volume performance.cache-size 256MB
> >>> gluster volume set test-volume performance.write-behind-window-size 4MB
> >>> gluster volume set test-volume performance.write-behind-window-size 8MB
> >>>
> >>>
> >> This is a good coincidence, I am meeting with write-behind
> >> maintainer(+Raghavendra G) today for the same doubt. I think we will have
> >> something by EOD IST. I will update you.
> >>
> >
> > Sorry, forgot to update you. It seems like there is a bug in Write-behind
> > and Facebook guys sent a patch http://review.gluster.org/16079 to fix the
> > same. But even with that I am not seeing any improvement. May be I am doing
> > something wrong. Will update you if I find anything more.
> >
> >> Changing the following option from its default value made the speed slower
> >>>
> >>> gluster volume set test-volume performance.write-behind off (on by default)
> >>>
> >>> Changing the following options initially appeared to give a 10% increase
> >>> in speed, but this vanished in subsequent tests (we think the apparent
> >>> increase may have been to a lighter workload on the computer from other
> >>> users)
> >>>
> >>> gluster volume set test-volume performance.stat-prefetch on
> >>> gluster volume set test-volume client.event-threads 4
> >>> gluster volume set test-volume server.event-threads 4
> >>>
> >>> Can anything be gleaned from these observations?  Are there other things
> >>> we can try?
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>>
> >>> On 06/20/2017 12:06 PM, Pat Haley wrote:
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> Sorry this took so long, but we had a real-time forecasting exercise
> >>> last week and I could only get to this now.
> >>>
> >>> Backend Hardware/OS:
> >>>
> >>>    - Much of the information on our back end system is included at the
> >>>    top of  http://lists.gluster.org/pipermail/gluster-users/2017-April/
> >>>    030529.html
> >>>    - The specific model of the hard disks is SeaGate ENTERPRISE
> >>>    CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s.
> >>>    - Note: there is one physical server that hosts both the NFS and the
> >>>    GlusterFS areas
> >>>
> >>> Latest tests
> >>>
> >>> I have had time to run the tests for one of the dd tests you requested
> >>> to the underlying XFS FS.  The median rate was 170 MB/s.  The dd results
> >>> and iostat record are in
> >>>
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/
> >>>
> >>> I'll add tests for the other brick and to the NFS area later.
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>> On 06/12/2017 06:06 PM, Ben Turner wrote:
> >>>
> >>> Ok you are correct, you have a pure distributed volume.  IE no replication overhead.  So normally for pure dist I use:
> >>>
> >>> throughput = slowest of disks / NIC * .6-.7
> >>>
> >>> In your case we have:
> >>>
> >>> 1200 * .6 = 720
> >>>
> >>> So you are seeing a little less throughput than I would expect in your configuration.  What I like to do here is:
> >>>
> >>> -First tell me more about your back end storage, will it sustain 1200 MB / sec?  What kind of HW?  How many disks?  What type and specs are the disks?  What kind of RAID are you using?
> >>>
> >>> -Second can you refresh me on your workload?  Are you doing reads / writes or both?  If both what mix?  Since we are using DD I assume you are working iwth large file sequential I/O, is this correct?
> >>>
> >>> -Run some DD tests on the back end XFS FS.  I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir.  Inside the test dir run:
> >>>
> >>> If you are focusing on a write workload run:
> >>>
> >>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync
> >>>
> >>> If you are focusing on a read workload run:
> >>>
> >>> # echo 3 > /proc/sys/vm/drop_caches
> >>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000
> >>>
> >>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! **
> >>>
> >>> Run this in a loop similar to how you did in:
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>>
> >>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time.  While this is running gather iostat for me:
> >>>
> >>> # iostat -c -m -x 1 > iostat-$(hostname).txt
> >>>
> >>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster.
> >>>
> >>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks?  I want to be sure we have an apples to apples comparison here.
> >>>
> >>> -b
> >>>
> >>>
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx>
> >>> To: "Ben Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx>
> >>> Sent: Monday, June 12, 2017 5:18:07 PM
> >>> Subject: Re:  Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> Here is the output:
> >>>
> >>> [root@mseas-data2 ~]# gluster volume info
> >>>
> >>> Volume Name: data-volume
> >>> Type: Distribute
> >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>> Status: Started
> >>> Number of Bricks: 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: mseas-data2:/mnt/brick1
> >>> Brick2: mseas-data2:/mnt/brick2
> >>> Options Reconfigured:
> >>> nfs.exports-auth-enable: on
> >>> diagnostics.brick-sys-log-level: WARNING
> >>> performance.readdir-ahead: on
> >>> nfs.disable: on
> >>> nfs.export-volumes: off
> >>>
> >>>
> >>> On 06/12/2017 05:01 PM, Ben Turner wrote:
> >>>
> >>> What is the output of gluster v info?  That will tell us more about your
> >>> config.
> >>>
> >>> -b
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx>
> >>> To: "Ben Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx>
> >>> Sent: Monday, June 12, 2017 4:54:00 PM
> >>> Subject: Re:  Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Ben,
> >>>
> >>> I guess I'm confused about what you mean by replication.  If I look at
> >>> the underlying bricks I only ever have a single copy of any file.  It
> >>> either resides on one brick or the other  (directories exist on both
> >>> bricks but not files).  We are not using gluster for redundancy (or at
> >>> least that wasn't our intent).   Is that what you meant by replication
> >>> or is it something else?
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>> On 06/12/2017 04:28 PM, Ben Turner wrote:
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx>
> >>> To: "Ben Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx>, "Pranith Kumar Karampuri"<pkarampu@xxxxxxxxxx> <pkarampu@xxxxxxxxxx>
> >>> Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx> <ravishankar@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx,
> >>> "Steve Postma" <SPostma@xxxxxxxxxxxx> <SPostma@xxxxxxxxxxxx>
> >>> Sent: Monday, June 12, 2017 2:35:41 PM
> >>> Subject: Re:  Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Guys,
> >>>
> >>> I was wondering what our next steps should be to solve the slow write
> >>> times.
> >>>
> >>> Recently I was debugging a large code and writing a lot of output at
> >>> every time step.  When I tried writing to our gluster disks, it was
> >>> taking over a day to do a single time step whereas if I had the same
> >>> program (same hardware, network) write to our nfs disk the time per
> >>> time-step was about 45 minutes. What we are shooting for here would be
> >>> to have similar times to either gluster of nfs.
> >>>
> >>> I can see in your test:
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>>
> >>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB /
> >>> sec} / #replicas{2} = 600).  Gluster does client side replication so with
> >>> replica 2 you will only ever see 1/2 the speed of your slowest part of
> >>> the
> >>> stack(NW, disk, RAM, CPU).  This is usually NW or disk and 600 is
> >>> normally
> >>> a best case.  Now in your output I do see the instances where you went
> >>> down to 200 MB / sec.  I can only explain this in three ways:
> >>>
> >>> 1.  You are not using conv=fdatasync and writes are actually going to
> >>> page
> >>> cache and then being flushed to disk.  During the fsync the memory is not
> >>> yet available and the disks are busy flushing dirty pages.
> >>> 2.  Your storage RAID group is shared across multiple LUNS(like in a SAN)
> >>> and when write times are slow the RAID group is busy serviceing other
> >>> LUNs.
> >>> 3.  Gluster bug / config issue / some other unknown unknown.
> >>>
> >>> So I see 2 issues here:
> >>>
> >>> 1.  NFS does in 45 minutes what gluster can do in 24 hours.
> >>> 2.  Sometimes your throughput drops dramatically.
> >>>
> >>> WRT #1 - have a look at my estimates above.  My formula for guestimating
> >>> gluster perf is: throughput = NIC throughput or storage(whatever is
> >>> slower) / # replicas * overhead(figure .7 or .8).  Also the larger the
> >>> record size the better for glusterfs mounts, I normally like to be at
> >>> LEAST 64k up to 1024k:
> >>>
> >>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000
> >>> conv=fdatasync
> >>>
> >>> WRT #2 - Again, I question your testing and your storage config.  Try
> >>> using
> >>> conv=fdatasync for your DDs, use a larger record size, and make sure that
> >>> your back end storage is not causing your slowdowns.  Also remember that
> >>> with replica 2 you will take ~50% hit on writes because the client uses
> >>> 50% of its bandwidth to write to one replica and 50% to the other.
> >>>
> >>> -b
> >>>
> >>>
> >>>
> >>>
> >>> Thanks
> >>>
> >>> Pat
> >>>
> >>>
> >>> On 06/02/2017 01:07 AM, Ben Turner wrote:
> >>>
> >>> Are you sure using conv=sync is what you want?  I normally use
> >>> conv=fdatasync, I'll look up the difference between the two and see if
> >>> it
> >>> affects your test.
> >>>
> >>>
> >>> -b
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx>
> >>> To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> <pkarampu@xxxxxxxxxx>
> >>> Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx> <ravishankar@xxxxxxxxxx>,gluster-users@xxxxxxxxxxx,
> >>> "Steve Postma" <SPostma@xxxxxxxxxxxx> <SPostma@xxxxxxxxxxxx>, "Ben
> >>> Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx>
> >>> Sent: Tuesday, May 30, 2017 9:40:34 PM
> >>> Subject: Re:  Slow write times to gluster disk
> >>>
> >>>
> >>> Hi Pranith,
> >>>
> >>> The "dd" command was:
> >>>
> >>>         dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
> >>>
> >>> There were 2 instances where dd reported 22 seconds. The output from
> >>> the
> >>> dd tests are in
> >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
> >>>
> >>> Pat
> >>>
> >>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
> >>>
> >>> Pat,
> >>>           What is the command you used? As per the following output,
> >>>           it
> >>> seems like at least one write operation took 16 seconds. Which is
> >>> really bad.
> >>>          96.39    1165.10 us      89.00 us*16487014.00 us*
> >>>          393212
> >>>          WRITE
> >>>
> >>>
> >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley@xxxxxxx<mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote:
> >>>
> >>>
> >>>        Hi Pranith,
> >>>
> >>>        I ran the same 'dd' test both in the gluster test volume and
> >>>        in
> >>>        the .glusterfs directory of each brick.  The median results
> >>>        (12
> >>>        dd
> >>>        trials in each test) are similar to before
> >>>
> >>>          * gluster test volume: 586.5 MB/s
> >>>          * bricks (in .glusterfs): 1.4 GB/s
> >>>
> >>>        The profile for the gluster test-volume is in
> >>>
> >>>        http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
> >>>        <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
> >>>
> >>>        Thanks
> >>>
> >>>        Pat
> >>>
> >>>
> >>>
> >>>
> >>>        On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
> >>>
> >>>        Let's start with the same 'dd' test we were testing with to
> >>>        see,
> >>>        what the numbers are. Please provide profile numbers for the
> >>>        same. From there on we will start tuning the volume to see
> >>>        what
> >>>        we can do.
> >>>
> >>>        On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley@xxxxxxx
> >>>        <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote:
> >>>
> >>>
> >>>            Hi Pranith,
> >>>
> >>>            Thanks for the tip.  We now have the gluster volume
> >>>            mounted
> >>>            under /home.  What tests do you recommend we run?
> >>>
> >>>            Thanks
> >>>
> >>>            Pat
> >>>
> >>>
> >>>
> >>>            On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
> >>>
> >>>            On Tue, May 16, 2017 at 9:20 PM, Pat Haley
> >>>            <phaley@xxxxxxx
> >>>            <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote:
> >>>
> >>>
> >>>                Hi Pranith,
> >>>
> >>>                Sorry for the delay.  I never saw received your
> >>>                reply
> >>>                (but I did receive Ben Turner's follow-up to your
> >>>                reply).  So we tried to create a gluster volume
> >>>                under
> >>>                /home using different variations of
> >>>
> >>>                gluster volume create test-volume
> >>>                mseas-data2:/home/gbrick_test_1
> >>>                mseas-data2:/home/gbrick_test_2 transport tcp
> >>>
> >>>                However we keep getting errors of the form
> >>>
> >>>                Wrong brick type: transport, use
> >>>                <HOSTNAME>:<export-dir-abs-path>
> >>>
> >>>                Any thoughts on what we're doing wrong?
> >>>
> >>>
> >>>            You should give transport tcp at the beginning I think.
> >>>            Anyways, transport tcp is the default, so no need to
> >>>            specify
> >>>            so remove those two words from the CLI.
> >>>
> >>>
> >>>                Also do you have a list of the test we should be
> >>>                running
> >>>                once we get this volume created?  Given the
> >>>                time-zone
> >>>                difference it might help if we can run a small
> >>>                battery
> >>>                of tests and post the results rather than
> >>>                test-post-new
> >>>                test-post... .
> >>>
> >>>
> >>>            This is the first time I am doing performance analysis
> >>>            on
> >>>            users as far as I remember. In our team there are
> >>>            separate
> >>>            engineers who do these tests. Ben who replied earlier is
> >>>            one
> >>>            such engineer.
> >>>
> >>>            Ben,
> >>>                Have any suggestions?
> >>>
> >>>
> >>>                Thanks
> >>>
> >>>                Pat
> >>>
> >>>
> >>>
> >>>                On 05/11/2017 12:06 PM, Pranith Kumar Karampuri
> >>>                wrote:
> >>>
> >>>                On Thu, May 11, 2017 at 9:32 PM, Pat Haley
> >>>                <phaley@xxxxxxx <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote:
> >>>
> >>>
> >>>                    Hi Pranith,
> >>>
> >>>                    The /home partition is mounted as ext4
> >>>                    /home ext4 defaults,usrquota,grpquota   1 2
> >>>
> >>>                    The brick partitions are mounted ax xfs
> >>>                    /mnt/brick1 xfs defaults 0 0
> >>>                    /mnt/brick2 xfs defaults 0 0
> >>>
> >>>                    Will this cause a problem with creating a
> >>>                    volume
> >>>                    under /home?
> >>>
> >>>
> >>>                I don't think the bottleneck is disk. You can do
> >>>                the
> >>>                same tests you did on your new volume to confirm?
> >>>
> >>>
> >>>                    Pat
> >>>
> >>>
> >>>
> >>>                    On 05/11/2017 11:32 AM, Pranith Kumar Karampuri
> >>>                    wrote:
> >>>
> >>>                    On Thu, May 11, 2017 at 8:57 PM, Pat Haley
> >>>                    <phaley@xxxxxxx <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>>
> >>>                    wrote:
> >>>
> >>>
> >>>                        Hi Pranith,
> >>>
> >>>                        Unfortunately, we don't have similar
> >>>                        hardware
> >>>                        for a small scale test.  All we have is
> >>>                        our
> >>>                        production hardware.
> >>>
> >>>
> >>>                    You said something about /home partition which
> >>>                    has
> >>>                    lesser disks, we can create plain distribute
> >>>                    volume inside one of those directories. After
> >>>                    we
> >>>                    are done, we can remove the setup. What do you
> >>>                    say?
> >>>
> >>>
> >>>                        Pat
> >>>
> >>>
> >>>
> >>>
> >>>                        On 05/11/2017 07:05 AM, Pranith Kumar
> >>>                        Karampuri wrote:
> >>>
> >>>                        On Thu, May 11, 2017 at 2:48 AM, Pat
> >>>                        Haley
> >>>                        <phaley@xxxxxxx <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>>
> >>>                        wrote:
> >>>
> >>>
> >>>                            Hi Pranith,
> >>>
> >>>                            Since we are mounting the partitions
> >>>                            as
> >>>                            the bricks, I tried the dd test
> >>>                            writing
> >>>                            to
> >>>                            <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>                            The results without oflag=sync were
> >>>                            1.6
> >>>                            Gb/s (faster than gluster but not as
> >>>                            fast
> >>>                            as I was expecting given the 1.2 Gb/s
> >>>                            to
> >>>                            the no-gluster area w/ fewer disks).
> >>>
> >>>
> >>>                        Okay, then 1.6Gb/s is what we need to
> >>>                        target
> >>>                        for, considering your volume is just
> >>>                        distribute. Is there any way you can do
> >>>                        tests
> >>>                        on similar hardware but at a small scale?
> >>>                        Just so we can run the workload to learn
> >>>                        more
> >>>                        about the bottlenecks in the system? We
> >>>                        can
> >>>                        probably try to get the speed to 1.2Gb/s
> >>>                        on
> >>>                        your /home partition you were telling me
> >>>                        yesterday. Let me know if that is
> >>>                        something
> >>>                        you are okay to do.
> >>>
> >>>
> >>>                            Pat
> >>>
> >>>
> >>>
> >>>                            On 05/10/2017 01:27 PM, Pranith Kumar
> >>>                            Karampuri wrote:
> >>>
> >>>                            On Wed, May 10, 2017 at 10:15 PM,
> >>>                            Pat
> >>>                            Haley <phaley@xxxxxxx
> >>>                            <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote:
> >>>
> >>>
> >>>                                Hi Pranith,
> >>>
> >>>                                Not entirely sure (this isn't my
> >>>                                area of expertise). I'll run
> >>>                                your
> >>>                                answer by some other people who
> >>>                                are
> >>>                                more familiar with this.
> >>>
> >>>                                I am also uncertain about how to
> >>>                                interpret the results when we
> >>>                                also
> >>>                                add the dd tests writing to the
> >>>                                /home area (no gluster, still on
> >>>                                the
> >>>                                same machine)
> >>>
> >>>                                  * dd test without oflag=sync
> >>>                                    (rough average of multiple
> >>>                                    tests)
> >>>                                      o gluster w/ fuse mount :
> >>>                                      570
> >>>                                      Mb/s
> >>>                                      o gluster w/ nfs mount:
> >>>                                      390
> >>>                                      Mb/s
> >>>                                      o nfs (no gluster):  1.2
> >>>                                      Gb/s
> >>>                                  * dd test with oflag=sync
> >>>                                  (rough
> >>>                                    average of multiple tests)
> >>>                                      o gluster w/ fuse mount:
> >>>                                      5
> >>>                                      Mb/s
> >>>                                      o gluster w/ nfs mount:
> >>>                                      200
> >>>                                      Mb/s
> >>>                                      o nfs (no gluster): 20
> >>>                                      Mb/s
> >>>
> >>>                                Given that the non-gluster area
> >>>                                is
> >>>                                a
> >>>                                RAID-6 of 4 disks while each
> >>>                                brick
> >>>                                of the gluster area is a RAID-6
> >>>                                of
> >>>                                32 disks, I would naively expect
> >>>                                the
> >>>                                writes to the gluster area to be
> >>>                                roughly 8x faster than to the
> >>>                                non-gluster.
> >>>
> >>>
> >>>                            I think a better test is to try and
> >>>                            write to a file using nfs without
> >>>                            any
> >>>                            gluster to a location that is not
> >>>                            inside
> >>>                            the brick but someother location
> >>>                            that
> >>>                            is
> >>>                            on same disk(s). If you are mounting
> >>>                            the
> >>>                            partition as the brick, then we can
> >>>                            write to a file inside .glusterfs
> >>>                            directory, something like
> >>>                            <brick-path>/.glusterfs/<file-to-be-removed-after-test>.
> >>>
> >>>
> >>>
> >>>                                I still think we have a speed
> >>>                                issue,
> >>>                                I can't tell if fuse vs nfs is
> >>>                                part
> >>>                                of the problem.
> >>>
> >>>
> >>>                            I got interested in the post because
> >>>                            I
> >>>                            read that fuse speed is lesser than
> >>>                            nfs
> >>>                            speed which is counter-intuitive to
> >>>                            my
> >>>                            understanding. So wanted
> >>>                            clarifications.
> >>>                            Now that I got my clarifications
> >>>                            where
> >>>                            fuse outperformed nfs without sync,
> >>>                            we
> >>>                            can resume testing as described
> >>>                            above
> >>>                            and try to find what it is. Based on
> >>>                            your email-id I am guessing you are
> >>>                            from
> >>>                            Boston and I am from Bangalore so if
> >>>                            you
> >>>                            are okay with doing this debugging
> >>>                            for
> >>>                            multiple days because of timezones,
> >>>                            I
> >>>                            will be happy to help. Please be a
> >>>                            bit
> >>>                            patient with me, I am under a
> >>>                            release
> >>>                            crunch but I am very curious with
> >>>                            the
> >>>                            problem you posted.
> >>>
> >>>                                Was there anything useful in the
> >>>                                profiles?
> >>>
> >>>
> >>>                            Unfortunately profiles didn't help
> >>>                            me
> >>>                            much, I think we are collecting the
> >>>                            profiles from an active volume, so
> >>>                            it
> >>>                            has a lot of information that is not
> >>>                            pertaining to dd so it is difficult
> >>>                            to
> >>>                            find the contributions of dd. So I
> >>>                            went
> >>>                            through your post again and found
> >>>                            something I didn't pay much
> >>>                            attention
> >>>                            to
> >>>                            earlier i.e. oflag=sync, so did my
> >>>                            own
> >>>                            tests on my setup with FUSE so sent
> >>>                            that
> >>>                            reply.
> >>>
> >>>
> >>>                                Pat
> >>>
> >>>
> >>>
> >>>                                On 05/10/2017 12:15 PM, Pranith
> >>>                                Kumar Karampuri wrote:
> >>>
> >>>                                Okay good. At least this
> >>>                                validates
> >>>                                my doubts. Handling O_SYNC in
> >>>                                gluster NFS and fuse is a bit
> >>>                                different.
> >>>                                When application opens a file
> >>>                                with
> >>>                                O_SYNC on fuse mount then each
> >>>                                write syscall has to be written
> >>>                                to
> >>>                                disk as part of the syscall
> >>>                                where
> >>>                                as in case of NFS, there is no
> >>>                                concept of open. NFS performs
> >>>                                write
> >>>                                though a handle saying it needs
> >>>                                to
> >>>                                be a synchronous write, so
> >>>                                write()
> >>>                                syscall is performed first then
> >>>                                it
> >>>                                performs fsync(). so an write
> >>>                                on
> >>>                                an
> >>>                                fd with O_SYNC becomes
> >>>                                write+fsync.
> >>>                                I am suspecting that when
> >>>                                multiple
> >>>                                threads do this write+fsync()
> >>>                                operation on the same file,
> >>>                                multiple writes are batched
> >>>                                together to be written do disk
> >>>                                so
> >>>                                the throughput on the disk is
> >>>                                increasing is my guess.
> >>>
> >>>                                Does it answer your doubts?
> >>>
> >>>                                On Wed, May 10, 2017 at 9:35
> >>>                                PM,
> >>>                                Pat Haley <phaley@xxxxxxx
> >>>                                <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote:
> >>>
> >>>
> >>>                                    Without the oflag=sync and
> >>>                                    only
> >>>                                    a single test of each, the
> >>>                                    FUSE
> >>>                                    is going faster than NFS:
> >>>
> >>>                                    FUSE:
> >>>                                    mseas-data2(dri_nascar)% dd
> >>>                                    if=/dev/zero count=4096
> >>>                                    bs=1048576 of=zeros.txt
> >>>                                    conv=sync
> >>>                                    4096+0 records in
> >>>                                    4096+0 records out
> >>>                                    4294967296 bytes (4.3 GB)
> >>>                                    copied, 7.46961 s, 575 MB/s
> >>>
> >>>
> >>>                                    NFS
> >>>                                    mseas-data2(HYCOM)% dd
> >>>                                    if=/dev/zero count=4096
> >>>                                    bs=1048576 of=zeros.txt
> >>>                                    conv=sync
> >>>                                    4096+0 records in
> >>>                                    4096+0 records out
> >>>                                    4294967296 bytes (4.3 GB)
> >>>                                    copied, 11.4264 s, 376 MB/s
> >>>
> >>>
> >>>
> >>>                                    On 05/10/2017 11:53 AM,
> >>>                                    Pranith
> >>>                                    Kumar Karampuri wrote:
> >>>
> >>>                                    Could you let me know the
> >>>                                    speed without oflag=sync
> >>>                                    on
> >>>                                    both the mounts? No need
> >>>                                    to
> >>>                                    collect profiles.
> >>>
> >>>                                    On Wed, May 10, 2017 at
> >>>                                    9:17
> >>>                                    PM, Pat Haley
> >>>                                    <phaley@xxxxxxx
> >>>                                    <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>>
> >>>                                    wrote:
> >>>
> >>>
> >>>                                        Here is what I see
> >>>                                        now:
> >>>
> >>>                                        [root@mseas-data2 ~]#
> >>>                                        gluster volume info
> >>>
> >>>                                        Volume Name:
> >>>                                        data-volume
> >>>                                        Type: Distribute
> >>>                                        Volume ID:
> >>>                                        c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>>                                        Status: Started
> >>>                                        Number of Bricks: 2
> >>>                                        Transport-type: tcp
> >>>                                        Bricks:
> >>>                                        Brick1:
> >>>                                        mseas-data2:/mnt/brick1
> >>>                                        Brick2:
> >>>                                        mseas-data2:/mnt/brick2
> >>>                                        Options Reconfigured:
> >>>                                        diagnostics.count-fop-hits:
> >>>                                        on
> >>>                                        diagnostics.latency-measurement:
> >>>                                        on
> >>>                                        nfs.exports-auth-enable:
> >>>                                        on
> >>>                                        diagnostics.brick-sys-log-level:
> >>>                                        WARNING
> >>>                                        performance.readdir-ahead:
> >>>                                        on
> >>>                                        nfs.disable: on
> >>>                                        nfs.export-volumes:
> >>>                                        off
> >>>
> >>>
> >>>
> >>>                                        On 05/10/2017 11:44
> >>>                                        AM,
> >>>                                        Pranith Kumar
> >>>                                        Karampuri
> >>>                                        wrote:
> >>>
> >>>                                        Is this the volume
> >>>                                        info
> >>>                                        you have?
> >>>
> >>>                                        >/[root at
> >>>                                        >mseas-data2
> >>>                                        <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users>
> >>>                                        ~]# gluster volume
> >>>                                        info
> >>>                                        />//>/Volume Name:
> >>>                                        data-volume />/Type:
> >>>                                        Distribute />/Volume
> >>>                                        ID:
> >>>                                        c162161e-2a2d-4dac-b015-f31fd89ceb18
> >>>                                        />/Status: Started
> >>>                                        />/Number
> >>>                                        of Bricks: 2
> >>>                                        />/Transport-type:
> >>>                                        tcp
> >>>                                        />/Bricks: />/Brick1:
> >>>                                        mseas-data2:/mnt/brick1
> >>>                                        />/Brick2:
> >>>                                        mseas-data2:/mnt/brick2
> >>>                                        />/Options
> >>>                                        Reconfigured:
> >>>                                        />/performance.readdir-ahead:
> >>>                                        on />/nfs.disable: on
> >>>                                        />/nfs.export-volumes:
> >>>                                        off
> >>>                                        /
> >>>                                        ​I copied this from
> >>>                                        old
> >>>                                        thread from 2016.
> >>>                                        This
> >>>                                        is
> >>>                                        distribute volume.
> >>>                                        Did
> >>>                                        you change any of the
> >>>                                        options in between?
> >>>
> >>>                                        --
> >>>
> >>>                                        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                                        Pat Haley
> >>>                                        Email:phaley@xxxxxxx
> >>>                                        <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                                        Center for Ocean
> >>>                                        Engineering
> >>>                                        Phone:  (617) 253-6824
> >>>                                        Dept. of Mechanical
> >>>                                        Engineering
> >>>                                        Fax:    (617) 253-8125
> >>>                                        MIT, Room
> >>>                                        5-213http://web.mit.edu/phaley/www/
> >>>                                        77 Massachusetts
> >>>                                        Avenue
> >>>                                        Cambridge, MA
> >>>                                        02139-4301
> >>>
> >>>                                    --
> >>>                                    Pranith
> >>>
> >>>                                    --
> >>>
> >>>                                    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                                    Pat Haley
> >>>                                    Email:phaley@xxxxxxx
> >>>                                    <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                                    Center for Ocean
> >>>                                    Engineering
> >>>                                    Phone:  (617) 253-6824
> >>>                                    Dept. of Mechanical
> >>>                                    Engineering
> >>>                                    Fax:    (617) 253-8125
> >>>                                    MIT, Room
> >>>                                    5-213http://web.mit.edu/phaley/www/
> >>>                                    77 Massachusetts Avenue
> >>>                                    Cambridge, MA  02139-4301
> >>>
> >>>                                --
> >>>                                Pranith
> >>>
> >>>                                --
> >>>
> >>>                                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                                Pat Haley
> >>>                                Email:phaley@xxxxxxx
> >>>                                <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                                Center for Ocean Engineering
> >>>                                Phone:
> >>>                                (617) 253-6824
> >>>                                Dept. of Mechanical Engineering
> >>>                                Fax:
> >>>                                (617) 253-8125
> >>>                                MIT, Room
> >>>                                5-213http://web.mit.edu/phaley/www/
> >>>                                77 Massachusetts Avenue
> >>>                                Cambridge, MA  02139-4301
> >>>
> >>>                            --
> >>>                            Pranith
> >>>
> >>>                            --
> >>>
> >>>                            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                            Pat Haley
> >>>                            Email:phaley@xxxxxxx
> >>>                            <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                            Center for Ocean Engineering
> >>>                            Phone:
> >>>                            (617) 253-6824
> >>>                            Dept. of Mechanical Engineering
> >>>                            Fax:
> >>>                            (617) 253-8125
> >>>                            MIT, Room
> >>>                            5-213http://web.mit.edu/phaley/www/
> >>>                            77 Massachusetts Avenue
> >>>                            Cambridge, MA  02139-4301
> >>>
> >>>                        --
> >>>                        Pranith
> >>>
> >>>                        --
> >>>
> >>>                        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                        Pat Haley
> >>>                        Email:phaley@xxxxxxx
> >>>                        <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                        Center for Ocean Engineering       Phone:
> >>>                        (617)
> >>>                        253-6824
> >>>                        Dept. of Mechanical Engineering    Fax:
> >>>                        (617)
> >>>                        253-8125
> >>>                        MIT, Room
> >>>                        5-213http://web.mit.edu/phaley/www/
> >>>                        77 Massachusetts Avenue
> >>>                        Cambridge, MA  02139-4301
> >>>
> >>>                    --
> >>>                    Pranith
> >>>
> >>>                    --
> >>>
> >>>                    -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                    Pat Haley
> >>>                    Email:phaley@xxxxxxx
> >>>                    <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                    Center for Ocean Engineering       Phone:
> >>>                    (617)
> >>>                    253-6824
> >>>                    Dept. of Mechanical Engineering    Fax:
> >>>                    (617)
> >>>                    253-8125
> >>>                    MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>                    77 Massachusetts Avenue
> >>>                    Cambridge, MA  02139-4301
> >>>
> >>>                --
> >>>                Pranith
> >>>
> >>>                --
> >>>
> >>>                -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>                Pat Haley
> >>>                Email:phaley@xxxxxxx
> >>>                <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>                Center for Ocean Engineering       Phone:  (617)
> >>>                253-6824
> >>>                Dept. of Mechanical Engineering    Fax:    (617)
> >>>                253-8125
> >>>                MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>                77 Massachusetts Avenue
> >>>                Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>>
> >>>            --
> >>>            Pranith
> >>>
> >>>            --
> >>>
> >>>            -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>            Pat Haley                          Email:phaley@xxxxxxx
> >>>            <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>            Center for Ocean Engineering       Phone:  (617) 253-6824
> >>>            Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>>            MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>            77 Massachusetts Avenue
> >>>            Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>>
> >>>        --
> >>>        Pranith
> >>>
> >>>        --
> >>>
> >>>        -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>>        Pat Haley                          Email:phaley@xxxxxxx
> >>>        <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>
> >>>        Center for Ocean Engineering       Phone:  (617) 253-6824
> >>>        Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>>        MIT, Room 5-213http://web.mit.edu/phaley/www/
> >>>        77 Massachusetts Avenue
> >>>        Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Pranith
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley                          Email:  phaley@xxxxxxx
> >>> Center for Ocean Engineering       Phone:  (617) 253-6824
> >>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley                          Email:  phaley@xxxxxxx
> >>> Center for Ocean Engineering       Phone:  (617) 253-6824
> >>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley                          Email:  phaley@xxxxxxx
> >>> Center for Ocean Engineering       Phone:  (617) 253-6824
> >>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley                          Email:  phaley@xxxxxxx
> >>> Center for Ocean Engineering       Phone:  (617) 253-6824
> >>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley                          Email:  phaley@xxxxxxx
> >>> Center for Ocean Engineering       Phone:  (617) 253-6824
> >>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA  02139-4301
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Gluster-users mailing listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users
> >>>
> >>>
> >>> --
> >>>
> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> >>> Pat Haley                          Email:  phaley@xxxxxxx
> >>> Center for Ocean Engineering       Phone:  (617) 253-6824
> >>> Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> >>> MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> >>> 77 Massachusetts Avenue
> >>> Cambridge, MA  02139-4301
> >>>
> >>>
> >>
> >>
> >> --
> >> Pranith
> >>
> >
> >
> >
> > --
> > Pranith
> >
> >
> > --
> >
> > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> > Pat Haley                          Email:  phaley@xxxxxxx
> > Center for Ocean Engineering       Phone:  (617) 253-6824
> > Dept. of Mechanical Engineering    Fax:    (617) 253-8125
> > MIT, Room 5-213                    http://web.mit.edu/phaley/www/
> > 77 Massachusetts Avenue
> > Cambridge, MA  02139-4301
> >
> >
> 
> 
> -- 
> Pranith

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux