On Tue, Jun 27, 2017 at 10:17:40AM +0530, Pranith Kumar Karampuri wrote: > On Mon, Jun 26, 2017 at 7:40 PM, Pat Haley <phaley@xxxxxxx> wrote: > > > > > Hi All, > > > > Decided to try another tests of gluster mounted via FUSE vs gluster > > mounted via NFS, this time using the software we run in production (i.e. > > our ocean model writing a netCDF file). > > > > gluster mounted via NFS the run took 2.3 hr > > > > gluster mounted via FUSE: the run took 44.2 hr > > > > The only problem with using gluster mounted via NFS is that it does not > > respect the group write permissions which we need. > > > > We have an exercise coming up in the a couple of weeks. It seems to me > > that in order to improve our write times before then, it would be good to > > solve the group write permissions for gluster mounted via NFS now. We can > > then revisit gluster mounted via FUSE afterwards. > > > > What information would you need to help us force gluster mounted via NFS > > to respect the group write permissions? > > > > +Niels, +Jiffin > > I added 2 more guys who work on NFS to check why this problem happens in > your environment. Let's see what information they may need to find the > problem and solve this issue. Hi Pat, depending on the number of groups that a user is part of, you may need to change some volume options. A complete description of the limitations on the number of groups can be foune here: https://github.com/gluster/glusterdocs/blob/master/Administrator%20Guide/Handling-of-users-with-many-groups.md HTH, Niels > > > > > > Thanks > > > > Pat > > > > > > > > > > On 06/24/2017 01:43 AM, Pranith Kumar Karampuri wrote: > > > > > > > > On Fri, Jun 23, 2017 at 9:10 AM, Pranith Kumar Karampuri < > > pkarampu@xxxxxxxxxx> wrote: > > > >> > >> > >> On Fri, Jun 23, 2017 at 2:23 AM, Pat Haley <phaley@xxxxxxx> wrote: > >> > >>> > >>> Hi, > >>> > >>> Today we experimented with some of the FUSE options that we found in the > >>> list. > >>> > >>> Changing these options had no effect: > >>> > >>> gluster volume set test-volume performance.cache-max-file-size 2MB > >>> gluster volume set test-volume performance.cache-refresh-timeout 4 > >>> gluster volume set test-volume performance.cache-size 256MB > >>> gluster volume set test-volume performance.write-behind-window-size 4MB > >>> gluster volume set test-volume performance.write-behind-window-size 8MB > >>> > >>> > >> This is a good coincidence, I am meeting with write-behind > >> maintainer(+Raghavendra G) today for the same doubt. I think we will have > >> something by EOD IST. I will update you. > >> > > > > Sorry, forgot to update you. It seems like there is a bug in Write-behind > > and Facebook guys sent a patch http://review.gluster.org/16079 to fix the > > same. But even with that I am not seeing any improvement. May be I am doing > > something wrong. Will update you if I find anything more. > > > >> Changing the following option from its default value made the speed slower > >>> > >>> gluster volume set test-volume performance.write-behind off (on by default) > >>> > >>> Changing the following options initially appeared to give a 10% increase > >>> in speed, but this vanished in subsequent tests (we think the apparent > >>> increase may have been to a lighter workload on the computer from other > >>> users) > >>> > >>> gluster volume set test-volume performance.stat-prefetch on > >>> gluster volume set test-volume client.event-threads 4 > >>> gluster volume set test-volume server.event-threads 4 > >>> > >>> Can anything be gleaned from these observations? Are there other things > >>> we can try? > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 06/20/2017 12:06 PM, Pat Haley wrote: > >>> > >>> > >>> Hi Ben, > >>> > >>> Sorry this took so long, but we had a real-time forecasting exercise > >>> last week and I could only get to this now. > >>> > >>> Backend Hardware/OS: > >>> > >>> - Much of the information on our back end system is included at the > >>> top of http://lists.gluster.org/pipermail/gluster-users/2017-April/ > >>> 030529.html > >>> - The specific model of the hard disks is SeaGate ENTERPRISE > >>> CAPACITY V.4 6TB (ST6000NM0024). The rated speed is 6Gb/s. > >>> - Note: there is one physical server that hosts both the NFS and the > >>> GlusterFS areas > >>> > >>> Latest tests > >>> > >>> I have had time to run the tests for one of the dd tests you requested > >>> to the underlying XFS FS. The median rate was 170 MB/s. The dd results > >>> and iostat record are in > >>> > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestXFS/ > >>> > >>> I'll add tests for the other brick and to the NFS area later. > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> On 06/12/2017 06:06 PM, Ben Turner wrote: > >>> > >>> Ok you are correct, you have a pure distributed volume. IE no replication overhead. So normally for pure dist I use: > >>> > >>> throughput = slowest of disks / NIC * .6-.7 > >>> > >>> In your case we have: > >>> > >>> 1200 * .6 = 720 > >>> > >>> So you are seeing a little less throughput than I would expect in your configuration. What I like to do here is: > >>> > >>> -First tell me more about your back end storage, will it sustain 1200 MB / sec? What kind of HW? How many disks? What type and specs are the disks? What kind of RAID are you using? > >>> > >>> -Second can you refresh me on your workload? Are you doing reads / writes or both? If both what mix? Since we are using DD I assume you are working iwth large file sequential I/O, is this correct? > >>> > >>> -Run some DD tests on the back end XFS FS. I normally have /xfs-mount/gluster-brick, if you have something similar just mkdir on the XFS -> /xfs-mount/my-test-dir. Inside the test dir run: > >>> > >>> If you are focusing on a write workload run: > >>> > >>> # dd if=/dev/zero of=/xfs-mount/file bs=1024k count=10000 conv=fdatasync > >>> > >>> If you are focusing on a read workload run: > >>> > >>> # echo 3 > /proc/sys/vm/drop_caches > >>> # dd if=/gluster-mount/file of=/dev/null bs=1024k count=10000 > >>> > >>> ** MAKE SURE TO DROP CACHE IN BETWEEN READS!! ** > >>> > >>> Run this in a loop similar to how you did in: > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > >>> > >>> Run this on both servers one at a time and if you are running on a SAN then run again on both at the same time. While this is running gather iostat for me: > >>> > >>> # iostat -c -m -x 1 > iostat-$(hostname).txt > >>> > >>> Lets see how the back end performs on both servers while capturing iostat, then see how the same workload / data looks on gluster. > >>> > >>> -Last thing, when you run your kernel NFS tests are you using the same filesystem / storage you are using for the gluster bricks? I want to be sure we have an apples to apples comparison here. > >>> > >>> -b > >>> > >>> > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx> > >>> To: "Ben Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx> > >>> Sent: Monday, June 12, 2017 5:18:07 PM > >>> Subject: Re: Slow write times to gluster disk > >>> > >>> > >>> Hi Ben, > >>> > >>> Here is the output: > >>> > >>> [root@mseas-data2 ~]# gluster volume info > >>> > >>> Volume Name: data-volume > >>> Type: Distribute > >>> Volume ID: c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>> Status: Started > >>> Number of Bricks: 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: mseas-data2:/mnt/brick1 > >>> Brick2: mseas-data2:/mnt/brick2 > >>> Options Reconfigured: > >>> nfs.exports-auth-enable: on > >>> diagnostics.brick-sys-log-level: WARNING > >>> performance.readdir-ahead: on > >>> nfs.disable: on > >>> nfs.export-volumes: off > >>> > >>> > >>> On 06/12/2017 05:01 PM, Ben Turner wrote: > >>> > >>> What is the output of gluster v info? That will tell us more about your > >>> config. > >>> > >>> -b > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx> > >>> To: "Ben Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx> > >>> Sent: Monday, June 12, 2017 4:54:00 PM > >>> Subject: Re: Slow write times to gluster disk > >>> > >>> > >>> Hi Ben, > >>> > >>> I guess I'm confused about what you mean by replication. If I look at > >>> the underlying bricks I only ever have a single copy of any file. It > >>> either resides on one brick or the other (directories exist on both > >>> bricks but not files). We are not using gluster for redundancy (or at > >>> least that wasn't our intent). Is that what you meant by replication > >>> or is it something else? > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> On 06/12/2017 04:28 PM, Ben Turner wrote: > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx> > >>> To: "Ben Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx>, "Pranith Kumar Karampuri"<pkarampu@xxxxxxxxxx> <pkarampu@xxxxxxxxxx> > >>> Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx> <ravishankar@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx, > >>> "Steve Postma" <SPostma@xxxxxxxxxxxx> <SPostma@xxxxxxxxxxxx> > >>> Sent: Monday, June 12, 2017 2:35:41 PM > >>> Subject: Re: Slow write times to gluster disk > >>> > >>> > >>> Hi Guys, > >>> > >>> I was wondering what our next steps should be to solve the slow write > >>> times. > >>> > >>> Recently I was debugging a large code and writing a lot of output at > >>> every time step. When I tried writing to our gluster disks, it was > >>> taking over a day to do a single time step whereas if I had the same > >>> program (same hardware, network) write to our nfs disk the time per > >>> time-step was about 45 minutes. What we are shooting for here would be > >>> to have similar times to either gluster of nfs. > >>> > >>> I can see in your test: > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > >>> > >>> You averaged ~600 MB / sec(expected for replica 2 with 10G, {~1200 MB / > >>> sec} / #replicas{2} = 600). Gluster does client side replication so with > >>> replica 2 you will only ever see 1/2 the speed of your slowest part of > >>> the > >>> stack(NW, disk, RAM, CPU). This is usually NW or disk and 600 is > >>> normally > >>> a best case. Now in your output I do see the instances where you went > >>> down to 200 MB / sec. I can only explain this in three ways: > >>> > >>> 1. You are not using conv=fdatasync and writes are actually going to > >>> page > >>> cache and then being flushed to disk. During the fsync the memory is not > >>> yet available and the disks are busy flushing dirty pages. > >>> 2. Your storage RAID group is shared across multiple LUNS(like in a SAN) > >>> and when write times are slow the RAID group is busy serviceing other > >>> LUNs. > >>> 3. Gluster bug / config issue / some other unknown unknown. > >>> > >>> So I see 2 issues here: > >>> > >>> 1. NFS does in 45 minutes what gluster can do in 24 hours. > >>> 2. Sometimes your throughput drops dramatically. > >>> > >>> WRT #1 - have a look at my estimates above. My formula for guestimating > >>> gluster perf is: throughput = NIC throughput or storage(whatever is > >>> slower) / # replicas * overhead(figure .7 or .8). Also the larger the > >>> record size the better for glusterfs mounts, I normally like to be at > >>> LEAST 64k up to 1024k: > >>> > >>> # dd if=/dev/zero of=/gluster-mount/file bs=1024k count=10000 > >>> conv=fdatasync > >>> > >>> WRT #2 - Again, I question your testing and your storage config. Try > >>> using > >>> conv=fdatasync for your DDs, use a larger record size, and make sure that > >>> your back end storage is not causing your slowdowns. Also remember that > >>> with replica 2 you will take ~50% hit on writes because the client uses > >>> 50% of its bandwidth to write to one replica and 50% to the other. > >>> > >>> -b > >>> > >>> > >>> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> On 06/02/2017 01:07 AM, Ben Turner wrote: > >>> > >>> Are you sure using conv=sync is what you want? I normally use > >>> conv=fdatasync, I'll look up the difference between the two and see if > >>> it > >>> affects your test. > >>> > >>> > >>> -b > >>> > >>> ----- Original Message ----- > >>> > >>> From: "Pat Haley" <phaley@xxxxxxx> <phaley@xxxxxxx> > >>> To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx> <pkarampu@xxxxxxxxxx> > >>> Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx> <ravishankar@xxxxxxxxxx>,gluster-users@xxxxxxxxxxx, > >>> "Steve Postma" <SPostma@xxxxxxxxxxxx> <SPostma@xxxxxxxxxxxx>, "Ben > >>> Turner" <bturner@xxxxxxxxxx> <bturner@xxxxxxxxxx> > >>> Sent: Tuesday, May 30, 2017 9:40:34 PM > >>> Subject: Re: Slow write times to gluster disk > >>> > >>> > >>> Hi Pranith, > >>> > >>> The "dd" command was: > >>> > >>> dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync > >>> > >>> There were 2 instances where dd reported 22 seconds. The output from > >>> the > >>> dd tests are in > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt > >>> > >>> Pat > >>> > >>> On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote: > >>> > >>> Pat, > >>> What is the command you used? As per the following output, > >>> it > >>> seems like at least one write operation took 16 seconds. Which is > >>> really bad. > >>> 96.39 1165.10 us 89.00 us*16487014.00 us* > >>> 393212 > >>> WRITE > >>> > >>> > >>> On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley@xxxxxxx<mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> I ran the same 'dd' test both in the gluster test volume and > >>> in > >>> the .glusterfs directory of each brick. The median results > >>> (12 > >>> dd > >>> trials in each test) are similar to before > >>> > >>> * gluster test volume: 586.5 MB/s > >>> * bricks (in .glusterfs): 1.4 GB/s > >>> > >>> The profile for the gluster test-volume is in > >>> > >>> http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt > >>> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> <http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> > >>> On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote: > >>> > >>> Let's start with the same 'dd' test we were testing with to > >>> see, > >>> what the numbers are. Please provide profile numbers for the > >>> same. From there on we will start tuning the volume to see > >>> what > >>> we can do. > >>> > >>> On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Thanks for the tip. We now have the gluster volume > >>> mounted > >>> under /home. What tests do you recommend we run? > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote: > >>> > >>> On Tue, May 16, 2017 at 9:20 PM, Pat Haley > >>> <phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Sorry for the delay. I never saw received your > >>> reply > >>> (but I did receive Ben Turner's follow-up to your > >>> reply). So we tried to create a gluster volume > >>> under > >>> /home using different variations of > >>> > >>> gluster volume create test-volume > >>> mseas-data2:/home/gbrick_test_1 > >>> mseas-data2:/home/gbrick_test_2 transport tcp > >>> > >>> However we keep getting errors of the form > >>> > >>> Wrong brick type: transport, use > >>> <HOSTNAME>:<export-dir-abs-path> > >>> > >>> Any thoughts on what we're doing wrong? > >>> > >>> > >>> You should give transport tcp at the beginning I think. > >>> Anyways, transport tcp is the default, so no need to > >>> specify > >>> so remove those two words from the CLI. > >>> > >>> > >>> Also do you have a list of the test we should be > >>> running > >>> once we get this volume created? Given the > >>> time-zone > >>> difference it might help if we can run a small > >>> battery > >>> of tests and post the results rather than > >>> test-post-new > >>> test-post... . > >>> > >>> > >>> This is the first time I am doing performance analysis > >>> on > >>> users as far as I remember. In our team there are > >>> separate > >>> engineers who do these tests. Ben who replied earlier is > >>> one > >>> such engineer. > >>> > >>> Ben, > >>> Have any suggestions? > >>> > >>> > >>> Thanks > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/11/2017 12:06 PM, Pranith Kumar Karampuri > >>> wrote: > >>> > >>> On Thu, May 11, 2017 at 9:32 PM, Pat Haley > >>> <phaley@xxxxxxx <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> The /home partition is mounted as ext4 > >>> /home ext4 defaults,usrquota,grpquota 1 2 > >>> > >>> The brick partitions are mounted ax xfs > >>> /mnt/brick1 xfs defaults 0 0 > >>> /mnt/brick2 xfs defaults 0 0 > >>> > >>> Will this cause a problem with creating a > >>> volume > >>> under /home? > >>> > >>> > >>> I don't think the bottleneck is disk. You can do > >>> the > >>> same tests you did on your new volume to confirm? > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/11/2017 11:32 AM, Pranith Kumar Karampuri > >>> wrote: > >>> > >>> On Thu, May 11, 2017 at 8:57 PM, Pat Haley > >>> <phaley@xxxxxxx <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> > >>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Unfortunately, we don't have similar > >>> hardware > >>> for a small scale test. All we have is > >>> our > >>> production hardware. > >>> > >>> > >>> You said something about /home partition which > >>> has > >>> lesser disks, we can create plain distribute > >>> volume inside one of those directories. After > >>> we > >>> are done, we can remove the setup. What do you > >>> say? > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> > >>> On 05/11/2017 07:05 AM, Pranith Kumar > >>> Karampuri wrote: > >>> > >>> On Thu, May 11, 2017 at 2:48 AM, Pat > >>> Haley > >>> <phaley@xxxxxxx <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> > >>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Since we are mounting the partitions > >>> as > >>> the bricks, I tried the dd test > >>> writing > >>> to > >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > >>> The results without oflag=sync were > >>> 1.6 > >>> Gb/s (faster than gluster but not as > >>> fast > >>> as I was expecting given the 1.2 Gb/s > >>> to > >>> the no-gluster area w/ fewer disks). > >>> > >>> > >>> Okay, then 1.6Gb/s is what we need to > >>> target > >>> for, considering your volume is just > >>> distribute. Is there any way you can do > >>> tests > >>> on similar hardware but at a small scale? > >>> Just so we can run the workload to learn > >>> more > >>> about the bottlenecks in the system? We > >>> can > >>> probably try to get the speed to 1.2Gb/s > >>> on > >>> your /home partition you were telling me > >>> yesterday. Let me know if that is > >>> something > >>> you are okay to do. > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/10/2017 01:27 PM, Pranith Kumar > >>> Karampuri wrote: > >>> > >>> On Wed, May 10, 2017 at 10:15 PM, > >>> Pat > >>> Haley <phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote: > >>> > >>> > >>> Hi Pranith, > >>> > >>> Not entirely sure (this isn't my > >>> area of expertise). I'll run > >>> your > >>> answer by some other people who > >>> are > >>> more familiar with this. > >>> > >>> I am also uncertain about how to > >>> interpret the results when we > >>> also > >>> add the dd tests writing to the > >>> /home area (no gluster, still on > >>> the > >>> same machine) > >>> > >>> * dd test without oflag=sync > >>> (rough average of multiple > >>> tests) > >>> o gluster w/ fuse mount : > >>> 570 > >>> Mb/s > >>> o gluster w/ nfs mount: > >>> 390 > >>> Mb/s > >>> o nfs (no gluster): 1.2 > >>> Gb/s > >>> * dd test with oflag=sync > >>> (rough > >>> average of multiple tests) > >>> o gluster w/ fuse mount: > >>> 5 > >>> Mb/s > >>> o gluster w/ nfs mount: > >>> 200 > >>> Mb/s > >>> o nfs (no gluster): 20 > >>> Mb/s > >>> > >>> Given that the non-gluster area > >>> is > >>> a > >>> RAID-6 of 4 disks while each > >>> brick > >>> of the gluster area is a RAID-6 > >>> of > >>> 32 disks, I would naively expect > >>> the > >>> writes to the gluster area to be > >>> roughly 8x faster than to the > >>> non-gluster. > >>> > >>> > >>> I think a better test is to try and > >>> write to a file using nfs without > >>> any > >>> gluster to a location that is not > >>> inside > >>> the brick but someother location > >>> that > >>> is > >>> on same disk(s). If you are mounting > >>> the > >>> partition as the brick, then we can > >>> write to a file inside .glusterfs > >>> directory, something like > >>> <brick-path>/.glusterfs/<file-to-be-removed-after-test>. > >>> > >>> > >>> > >>> I still think we have a speed > >>> issue, > >>> I can't tell if fuse vs nfs is > >>> part > >>> of the problem. > >>> > >>> > >>> I got interested in the post because > >>> I > >>> read that fuse speed is lesser than > >>> nfs > >>> speed which is counter-intuitive to > >>> my > >>> understanding. So wanted > >>> clarifications. > >>> Now that I got my clarifications > >>> where > >>> fuse outperformed nfs without sync, > >>> we > >>> can resume testing as described > >>> above > >>> and try to find what it is. Based on > >>> your email-id I am guessing you are > >>> from > >>> Boston and I am from Bangalore so if > >>> you > >>> are okay with doing this debugging > >>> for > >>> multiple days because of timezones, > >>> I > >>> will be happy to help. Please be a > >>> bit > >>> patient with me, I am under a > >>> release > >>> crunch but I am very curious with > >>> the > >>> problem you posted. > >>> > >>> Was there anything useful in the > >>> profiles? > >>> > >>> > >>> Unfortunately profiles didn't help > >>> me > >>> much, I think we are collecting the > >>> profiles from an active volume, so > >>> it > >>> has a lot of information that is not > >>> pertaining to dd so it is difficult > >>> to > >>> find the contributions of dd. So I > >>> went > >>> through your post again and found > >>> something I didn't pay much > >>> attention > >>> to > >>> earlier i.e. oflag=sync, so did my > >>> own > >>> tests on my setup with FUSE so sent > >>> that > >>> reply. > >>> > >>> > >>> Pat > >>> > >>> > >>> > >>> On 05/10/2017 12:15 PM, Pranith > >>> Kumar Karampuri wrote: > >>> > >>> Okay good. At least this > >>> validates > >>> my doubts. Handling O_SYNC in > >>> gluster NFS and fuse is a bit > >>> different. > >>> When application opens a file > >>> with > >>> O_SYNC on fuse mount then each > >>> write syscall has to be written > >>> to > >>> disk as part of the syscall > >>> where > >>> as in case of NFS, there is no > >>> concept of open. NFS performs > >>> write > >>> though a handle saying it needs > >>> to > >>> be a synchronous write, so > >>> write() > >>> syscall is performed first then > >>> it > >>> performs fsync(). so an write > >>> on > >>> an > >>> fd with O_SYNC becomes > >>> write+fsync. > >>> I am suspecting that when > >>> multiple > >>> threads do this write+fsync() > >>> operation on the same file, > >>> multiple writes are batched > >>> together to be written do disk > >>> so > >>> the throughput on the disk is > >>> increasing is my guess. > >>> > >>> Does it answer your doubts? > >>> > >>> On Wed, May 10, 2017 at 9:35 > >>> PM, > >>> Pat Haley <phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> wrote: > >>> > >>> > >>> Without the oflag=sync and > >>> only > >>> a single test of each, the > >>> FUSE > >>> is going faster than NFS: > >>> > >>> FUSE: > >>> mseas-data2(dri_nascar)% dd > >>> if=/dev/zero count=4096 > >>> bs=1048576 of=zeros.txt > >>> conv=sync > >>> 4096+0 records in > >>> 4096+0 records out > >>> 4294967296 bytes (4.3 GB) > >>> copied, 7.46961 s, 575 MB/s > >>> > >>> > >>> NFS > >>> mseas-data2(HYCOM)% dd > >>> if=/dev/zero count=4096 > >>> bs=1048576 of=zeros.txt > >>> conv=sync > >>> 4096+0 records in > >>> 4096+0 records out > >>> 4294967296 bytes (4.3 GB) > >>> copied, 11.4264 s, 376 MB/s > >>> > >>> > >>> > >>> On 05/10/2017 11:53 AM, > >>> Pranith > >>> Kumar Karampuri wrote: > >>> > >>> Could you let me know the > >>> speed without oflag=sync > >>> on > >>> both the mounts? No need > >>> to > >>> collect profiles. > >>> > >>> On Wed, May 10, 2017 at > >>> 9:17 > >>> PM, Pat Haley > >>> <phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx>> > >>> wrote: > >>> > >>> > >>> Here is what I see > >>> now: > >>> > >>> [root@mseas-data2 ~]# > >>> gluster volume info > >>> > >>> Volume Name: > >>> data-volume > >>> Type: Distribute > >>> Volume ID: > >>> c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>> Status: Started > >>> Number of Bricks: 2 > >>> Transport-type: tcp > >>> Bricks: > >>> Brick1: > >>> mseas-data2:/mnt/brick1 > >>> Brick2: > >>> mseas-data2:/mnt/brick2 > >>> Options Reconfigured: > >>> diagnostics.count-fop-hits: > >>> on > >>> diagnostics.latency-measurement: > >>> on > >>> nfs.exports-auth-enable: > >>> on > >>> diagnostics.brick-sys-log-level: > >>> WARNING > >>> performance.readdir-ahead: > >>> on > >>> nfs.disable: on > >>> nfs.export-volumes: > >>> off > >>> > >>> > >>> > >>> On 05/10/2017 11:44 > >>> AM, > >>> Pranith Kumar > >>> Karampuri > >>> wrote: > >>> > >>> Is this the volume > >>> info > >>> you have? > >>> > >>> >/[root at > >>> >mseas-data2 > >>> <http://www.gluster.org/mailman/listinfo/gluster-users> <http://www.gluster.org/mailman/listinfo/gluster-users> > >>> ~]# gluster volume > >>> info > >>> />//>/Volume Name: > >>> data-volume />/Type: > >>> Distribute />/Volume > >>> ID: > >>> c162161e-2a2d-4dac-b015-f31fd89ceb18 > >>> />/Status: Started > >>> />/Number > >>> of Bricks: 2 > >>> />/Transport-type: > >>> tcp > >>> />/Bricks: />/Brick1: > >>> mseas-data2:/mnt/brick1 > >>> />/Brick2: > >>> mseas-data2:/mnt/brick2 > >>> />/Options > >>> Reconfigured: > >>> />/performance.readdir-ahead: > >>> on />/nfs.disable: on > >>> />/nfs.export-volumes: > >>> off > >>> / > >>> I copied this from > >>> old > >>> thread from 2016. > >>> This > >>> is > >>> distribute volume. > >>> Did > >>> you change any of the > >>> options in between? > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean > >>> Engineering > >>> Phone: (617) 253-6824 > >>> Dept. of Mechanical > >>> Engineering > >>> Fax: (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts > >>> Avenue > >>> Cambridge, MA > >>> 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean > >>> Engineering > >>> Phone: (617) 253-6824 > >>> Dept. of Mechanical > >>> Engineering > >>> Fax: (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering > >>> Phone: > >>> (617) 253-6824 > >>> Dept. of Mechanical Engineering > >>> Fax: > >>> (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering > >>> Phone: > >>> (617) 253-6824 > >>> Dept. of Mechanical Engineering > >>> Fax: > >>> (617) 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering Phone: > >>> (617) > >>> 253-6824 > >>> Dept. of Mechanical Engineering Fax: > >>> (617) > >>> 253-8125 > >>> MIT, Room > >>> 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering Phone: > >>> (617) > >>> 253-6824 > >>> Dept. of Mechanical Engineering Fax: > >>> (617) > >>> 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley > >>> Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering Phone: (617) > >>> 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) > >>> 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email:phaley@xxxxxxx > >>> <mailto:phaley@xxxxxxx> <phaley@xxxxxxx> > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> Pranith > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley@xxxxxxx > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley@xxxxxxx > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley@xxxxxxx > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley@xxxxxxx > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley@xxxxxxx > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >>> > >>> _______________________________________________ > >>> Gluster-users mailing listGluster-users@gluster.orghttp://lists.gluster.org/mailman/listinfo/gluster-users > >>> > >>> > >>> -- > >>> > >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > >>> Pat Haley Email: phaley@xxxxxxx > >>> Center for Ocean Engineering Phone: (617) 253-6824 > >>> Dept. of Mechanical Engineering Fax: (617) 253-8125 > >>> MIT, Room 5-213 http://web.mit.edu/phaley/www/ > >>> 77 Massachusetts Avenue > >>> Cambridge, MA 02139-4301 > >>> > >>> > >> > >> > >> -- > >> Pranith > >> > > > > > > > > -- > > Pranith > > > > > > -- > > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > > Pat Haley Email: phaley@xxxxxxx > > Center for Ocean Engineering Phone: (617) 253-6824 > > Dept. of Mechanical Engineering Fax: (617) 253-8125 > > MIT, Room 5-213 http://web.mit.edu/phaley/www/ > > 77 Massachusetts Avenue > > Cambridge, MA 02139-4301 > > > > > > > -- > Pranith
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users