Hi Guys,
I was wondering what our next steps should be to solve the slow write times.
Recently I was debugging a large code and writing a lot of output at
every time step. When I tried writing to our gluster disks, it was
taking over a day to do a single time step whereas if I had the same
program (same hardware, network) write to our nfs disk the time per
time-step was about 45 minutes. What we are shooting for here would be
to have similar times to either gluster of nfs.
Thanks
Pat
On 06/02/2017 01:07 AM, Ben Turner wrote:
Are you sure using conv=sync is what you want? I normally use conv=fdatasync, I'll look up the difference between the two and see if it affects your test.
-b
----- Original Message -----
From: "Pat Haley" <phaley@xxxxxxx>
To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
Cc: "Ravishankar N" <ravishankar@xxxxxxxxxx>, gluster-users@xxxxxxxxxxx, "Steve Postma" <SPostma@xxxxxxxxxxxx>, "Ben
Turner" <bturner@xxxxxxxxxx>
Sent: Tuesday, May 30, 2017 9:40:34 PM
Subject: Re: Slow write times to gluster disk
Hi Pranith,
The "dd" command was:
dd if=/dev/zero count=4096 bs=1048576 of=zeros.txt conv=sync
There were 2 instances where dd reported 22 seconds. The output from the
dd tests are in
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/dd_testvol_gluster.txt
Pat
On 05/30/2017 09:27 PM, Pranith Kumar Karampuri wrote:
Pat,
What is the command you used? As per the following output, it
seems like at least one write operation took 16 seconds. Which is
really bad.
96.39 1165.10 us 89.00 us*16487014.00 us* 393212
WRITE
On Tue, May 30, 2017 at 10:36 PM, Pat Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
I ran the same 'dd' test both in the gluster test volume and in
the .glusterfs directory of each brick. The median results (12 dd
trials in each test) are similar to before
* gluster test volume: 586.5 MB/s
* bricks (in .glusterfs): 1.4 GB/s
The profile for the gluster test-volume is in
http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt
<http://mseas.mit.edu/download/phaley/GlusterUsers/TestVol/profile_testvol_gluster.txt>
Thanks
Pat
On 05/30/2017 12:10 PM, Pranith Kumar Karampuri wrote:
Let's start with the same 'dd' test we were testing with to see,
what the numbers are. Please provide profile numbers for the
same. From there on we will start tuning the volume to see what
we can do.
On Tue, May 30, 2017 at 9:16 PM, Pat Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
Thanks for the tip. We now have the gluster volume mounted
under /home. What tests do you recommend we run?
Thanks
Pat
On 05/17/2017 05:01 AM, Pranith Kumar Karampuri wrote:
On Tue, May 16, 2017 at 9:20 PM, Pat Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
Sorry for the delay. I never saw received your reply
(but I did receive Ben Turner's follow-up to your
reply). So we tried to create a gluster volume under
/home using different variations of
gluster volume create test-volume
mseas-data2:/home/gbrick_test_1
mseas-data2:/home/gbrick_test_2 transport tcp
However we keep getting errors of the form
Wrong brick type: transport, use
<HOSTNAME>:<export-dir-abs-path>
Any thoughts on what we're doing wrong?
You should give transport tcp at the beginning I think.
Anyways, transport tcp is the default, so no need to specify
so remove those two words from the CLI.
Also do you have a list of the test we should be running
once we get this volume created? Given the time-zone
difference it might help if we can run a small battery
of tests and post the results rather than test-post-new
test-post... .
This is the first time I am doing performance analysis on
users as far as I remember. In our team there are separate
engineers who do these tests. Ben who replied earlier is one
such engineer.
Ben,
Have any suggestions?
Thanks
Pat
On 05/11/2017 12:06 PM, Pranith Kumar Karampuri wrote:
On Thu, May 11, 2017 at 9:32 PM, Pat Haley
<phaley@xxxxxxx <mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
The /home partition is mounted as ext4
/home ext4 defaults,usrquota,grpquota 1 2
The brick partitions are mounted ax xfs
/mnt/brick1 xfs defaults 0 0
/mnt/brick2 xfs defaults 0 0
Will this cause a problem with creating a volume
under /home?
I don't think the bottleneck is disk. You can do the
same tests you did on your new volume to confirm?
Pat
On 05/11/2017 11:32 AM, Pranith Kumar Karampuri wrote:
On Thu, May 11, 2017 at 8:57 PM, Pat Haley
<phaley@xxxxxxx <mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
Unfortunately, we don't have similar hardware
for a small scale test. All we have is our
production hardware.
You said something about /home partition which has
lesser disks, we can create plain distribute
volume inside one of those directories. After we
are done, we can remove the setup. What do you say?
Pat
On 05/11/2017 07:05 AM, Pranith Kumar
Karampuri wrote:
On Thu, May 11, 2017 at 2:48 AM, Pat Haley
<phaley@xxxxxxx <mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
Since we are mounting the partitions as
the bricks, I tried the dd test writing
to
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
The results without oflag=sync were 1.6
Gb/s (faster than gluster but not as fast
as I was expecting given the 1.2 Gb/s to
the no-gluster area w/ fewer disks).
Okay, then 1.6Gb/s is what we need to target
for, considering your volume is just
distribute. Is there any way you can do tests
on similar hardware but at a small scale?
Just so we can run the workload to learn more
about the bottlenecks in the system? We can
probably try to get the speed to 1.2Gb/s on
your /home partition you were telling me
yesterday. Let me know if that is something
you are okay to do.
Pat
On 05/10/2017 01:27 PM, Pranith Kumar
Karampuri wrote:
On Wed, May 10, 2017 at 10:15 PM, Pat
Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:
Hi Pranith,
Not entirely sure (this isn't my
area of expertise). I'll run your
answer by some other people who are
more familiar with this.
I am also uncertain about how to
interpret the results when we also
add the dd tests writing to the
/home area (no gluster, still on the
same machine)
* dd test without oflag=sync
(rough average of multiple tests)
o gluster w/ fuse mount : 570 Mb/s
o gluster w/ nfs mount: 390 Mb/s
o nfs (no gluster): 1.2 Gb/s
* dd test with oflag=sync (rough
average of multiple tests)
o gluster w/ fuse mount: 5 Mb/s
o gluster w/ nfs mount: 200 Mb/s
o nfs (no gluster): 20 Mb/s
Given that the non-gluster area is a
RAID-6 of 4 disks while each brick
of the gluster area is a RAID-6 of
32 disks, I would naively expect the
writes to the gluster area to be
roughly 8x faster than to the
non-gluster.
I think a better test is to try and
write to a file using nfs without any
gluster to a location that is not inside
the brick but someother location that is
on same disk(s). If you are mounting the
partition as the brick, then we can
write to a file inside .glusterfs
directory, something like
<brick-path>/.glusterfs/<file-to-be-removed-after-test>.
I still think we have a speed issue,
I can't tell if fuse vs nfs is part
of the problem.
I got interested in the post because I
read that fuse speed is lesser than nfs
speed which is counter-intuitive to my
understanding. So wanted clarifications.
Now that I got my clarifications where
fuse outperformed nfs without sync, we
can resume testing as described above
and try to find what it is. Based on
your email-id I am guessing you are from
Boston and I am from Bangalore so if you
are okay with doing this debugging for
multiple days because of timezones, I
will be happy to help. Please be a bit
patient with me, I am under a release
crunch but I am very curious with the
problem you posted.
Was there anything useful in the
profiles?
Unfortunately profiles didn't help me
much, I think we are collecting the
profiles from an active volume, so it
has a lot of information that is not
pertaining to dd so it is difficult to
find the contributions of dd. So I went
through your post again and found
something I didn't pay much attention to
earlier i.e. oflag=sync, so did my own
tests on my setup with FUSE so sent that
reply.
Pat
On 05/10/2017 12:15 PM, Pranith
Kumar Karampuri wrote:
Okay good. At least this validates
my doubts. Handling O_SYNC in
gluster NFS and fuse is a bit
different.
When application opens a file with
O_SYNC on fuse mount then each
write syscall has to be written to
disk as part of the syscall where
as in case of NFS, there is no
concept of open. NFS performs write
though a handle saying it needs to
be a synchronous write, so write()
syscall is performed first then it
performs fsync(). so an write on an
fd with O_SYNC becomes write+fsync.
I am suspecting that when multiple
threads do this write+fsync()
operation on the same file,
multiple writes are batched
together to be written do disk so
the throughput on the disk is
increasing is my guess.
Does it answer your doubts?
On Wed, May 10, 2017 at 9:35 PM,
Pat Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:
Without the oflag=sync and only
a single test of each, the FUSE
is going faster than NFS:
FUSE:
mseas-data2(dri_nascar)% dd
if=/dev/zero count=4096
bs=1048576 of=zeros.txt conv=sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB)
copied, 7.46961 s, 575 MB/s
NFS
mseas-data2(HYCOM)% dd
if=/dev/zero count=4096
bs=1048576 of=zeros.txt conv=sync
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB)
copied, 11.4264 s, 376 MB/s
On 05/10/2017 11:53 AM, Pranith
Kumar Karampuri wrote:
Could you let me know the
speed without oflag=sync on
both the mounts? No need to
collect profiles.
On Wed, May 10, 2017 at 9:17
PM, Pat Haley <phaley@xxxxxxx
<mailto:phaley@xxxxxxx>> wrote:
Here is what I see now:
[root@mseas-data2 ~]#
gluster volume info
Volume Name: data-volume
Type: Distribute
Volume ID:
c162161e-2a2d-4dac-b015-f31fd89ceb18
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1:
mseas-data2:/mnt/brick1
Brick2:
mseas-data2:/mnt/brick2
Options Reconfigured:
diagnostics.count-fop-hits: on
diagnostics.latency-measurement:
on
nfs.exports-auth-enable: on
diagnostics.brick-sys-log-level:
WARNING
performance.readdir-ahead: on
nfs.disable: on
nfs.export-volumes: off
On 05/10/2017 11:44 AM,
Pranith Kumar Karampuri wrote:
Is this the volume info
you have?
>/[root at mseas-data2
<http://www.gluster.org/mailman/listinfo/gluster-users>
~]# gluster volume info
/>//>/Volume Name:
data-volume />/Type:
Distribute />/Volume ID:
c162161e-2a2d-4dac-b015-f31fd89ceb18
/>/Status: Started />/Number
of Bricks: 2
/>/Transport-type: tcp
/>/Bricks: />/Brick1:
mseas-data2:/mnt/brick1
/>/Brick2:
mseas-data2:/mnt/brick2
/>/Options Reconfigured:
/>/performance.readdir-ahead:
on />/nfs.disable: on
/>/nfs.export-volumes: off /
I copied this from old
thread from 2016. This is
distribute volume. Did
you change any of the
options in between?
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley
Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering
Phone: (617) 253-6824
Dept. of Mechanical Engineering
Fax: (617) 253-8125
MIT, Room
5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley
Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering
Phone: (617) 253-6824
Dept. of Mechanical Engineering
Fax: (617) 253-8125
MIT, Room
5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley
Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone:
(617) 253-6824
Dept. of Mechanical Engineering Fax:
(617) 253-8125
MIT, Room
5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley
Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone:
(617) 253-6824
Dept. of Mechanical Engineering Fax:
(617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley
Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone: (617)
253-6824
Dept. of Mechanical Engineering Fax: (617)
253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone: (617)
253-6824
Dept. of Mechanical Engineering Fax: (617)
253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email:phaley@xxxxxxx
<mailto:phaley@xxxxxxx>
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
Pranith
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Pat Haley Email: phaley@xxxxxxx
Center for Ocean Engineering Phone: (617) 253-6824
Dept. of Mechanical Engineering Fax: (617) 253-8125
MIT, Room 5-213 http://web.mit.edu/phaley/www/
77 Massachusetts Avenue
Cambridge, MA 02139-4301
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users