On Tuesday 09 June 2015 01:08 PM,
Geoffrey Letessier wrote:
Hi,
Yes of course:
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -s
/export/brick_home/brick*/amyloid_team
cl-storage1: 1608522280 /export/brick_home/brick1/amyloid_team
cl-storage3: 1619630616 /export/brick_home/brick1/amyloid_team
cl-storage1: 1614057836 /export/brick_home/brick2/amyloid_team
cl-storage3: 1602653808 /export/brick_home/brick2/amyloid_team
The sum is: 6444864540 (around 6.4-6.5TB) while
the quota list displays 7.7TB.
So, the mistake is roughly 1.2-1.3TB, in other
words around 16% -which is too huge, no?
In addition, since the quota is exceeded, i note a
lot of files like following:
[root@lucifer ~]# pdsh -w cl-storage[1,3] "cd
/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/;
ls -ail remd_100.sh 2> /dev/null" 2>/dev/null
cl-storage3: 133325688 ---------T 2 tarus
amyloid_team 0 16 févr. 10:20 remd_100.sh
note the ’T’ at the end of perms and the file size
to 0B.
And, yesterday, some files were duplicated but not
anymore...
The worst is, previously, all these files were OK.
In other words, exceeding quota made file or content deletions
or corruptions… What can I do to prevent to situation for the
futur -because I guess i cannot do something to rollback this
situation now, right?
Hi Geoffrey,
I tried re-creating the problem.
Here is the behaviour of vi editor.
When a file is saved in vi editor, it creates a backup file
under home dir and opens the original file with 'O_TRUNC' flag and
hence file was truncated.
Here is the strace of vi editor when it gets 'EDQUOT' error:
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 3
write(3, "line one\nline two\n", 18) = 18
fsync(3) = 0
close(3) = -1 EDQUOT (Disk
quota exceeded)
chmod("hello", 0100644) = 0
open("/root/hello~", O_RDONLY) = 3
open("hello", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7
read(3, "line one\n", 256) = 9
write(7, "line one\n", 9) = 9
read(3, "", 256) = 0
close(7) = -1 EDQUOT (Disk
quota exceeded)
close(3) = 0
To re-cover the truncated file, please find if there are
any backup file 'remd_115.sh~' under '~/' or on the same dir where
this file exists. If exists you can copy this file.
Thanks,
Vijay
Geoffrey
------------------------------------------------------
Geoffrey
Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
On Monday 08 June 2015
07:11 PM, Geoffrey Letessier wrote:
In addition, i notice a very big difference between
the sum of DU on each brick and « quota list »
display, as you can read below:
[root@lucifer
~]# pdsh -w cl-storage[1,3] du -sh
/export/brick_home/brick*/amyloid_team
cl-storage1:
1,6T /export/brick_home/brick1/amyloid_team
cl-storage3:
1,6T /export/brick_home/brick1/amyloid_team
cl-storage1:
1,6T /export/brick_home/brick2/amyloid_team
cl-storage3:
1,6T /export/brick_home/brick2/amyloid_team
[root@lucifer
~]# gluster volume quota vol_home list
/amyloid_team
Path Hard-limit
Soft-limit Used Available
--------------------------------------------------------------------------------
/amyloid_team
9.0TB 90%
7.8TB 1.2TB
As you can notice, the sum of all
bricks gives me roughly 6.4TB and « quota list »
around 7.8TB; so there is a difference of 1.4TB
i’m not able to explain… Do you have any idea?
There were few issues when quota accounting
the size, we have fixed some of these issues in 3.7
'df -h' will round
off the values, can you please provide the output of
'df' without -h option?
Thanks,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de
Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hello,
Concerning the 3.5.3 version
of GlusterFS, I met this morning a strange
issue writing file when quota is
exceeded.
One person of my lab, whose
her quota is exceeded (but she didn’t know
about) try to modify a file but, because
of exceeded quota, she was unable to and
decided to exit VI. Now, her file is
empty/blank as you can read below:
we suspect 'vi' might have created tmp file
before writing to a file. We are working on
re-creating this problem and will update you on the
same.
pdsh@lucifer: cl-storage3: ssh
exited with exit code 2
cl-storage1: ---------T 2 tarus
amyloid_team 0 19 févr. 12:34
/export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
cl-storage1: -rwxrw-r-- 2 tarus
amyloid_team 0 8 juin 12:38
/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
In addition, i dont
understand why, my volume being a
distributed volume inside replica
(cl-storage[1,3] is replicated only on
cl-storage[2,4]), i have 2 « same »
files (complete path) in 2 different
bricks (as you can read above).
Thanks by advance for your
help and clarification.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique &
ingénieur système
UPR 9080 - CNRS - Laboratoire de
Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005
Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hi
Ben,
I just check my
messages log files, both on
client and server, and I dont
find any hung task you notice on
yours..
As you can read
below, i dont note the
performance issue in a simple DD
but I think my issue is
concerning a set of small files
(tens of thousands nay more)…
[root@nisus
test]# ddt -t 10g /mnt/test/
Writing to
/mnt/test/ddt.8362 ...
syncing ... done.
sleeping 10
seconds ... done.
Reading from
/mnt/test/ddt.8362 ... done.
10240MiB
KiB/s CPU%
Write
114770 4
Read
40675 4
for info:
/mnt/test concerns the single
v2 GlFS volume
[root@nisus
test]# ddt -t 10g
/mnt/fhgfs/
Writing to
/mnt/fhgfs/ddt.8380 ...
syncing ... done.
sleeping 10
seconds ... done.
Reading from
/mnt/fhgfs/ddt.8380 ...
done.
10240MiB
KiB/s CPU%
Write
102591 1
Read
98079 2
Do you have a idea
how to tune/optimize performance
settings? and/or TCP settings
(MTU, etc.)?
---------------------------------------------------------------
|
| UNTAR | DU | FIND
| TAR | RM |
---------------------------------------------------------------
| single
| ~3m45s | ~43s |
~47s | ~3m10s | ~3m15s |
---------------------------------------------------------------
| replicated
| ~5m10s | ~59s |
~1m6s | ~1m19s | ~1m49s |
---------------------------------------------------------------
| distributed
| ~4m18s | ~41s |
~57s | ~2m24s | ~1m38s |
---------------------------------------------------------------
| dist-repl
| ~8m18s | ~1m4s | ~1m11s
| ~1m24s | ~2m40s |
---------------------------------------------------------------
| native FS
| ~11s | ~4s |
~2s | ~56s | ~10s |
---------------------------------------------------------------
| BeeGFS
| ~3m43s | ~15s |
~3s | ~1m33s | ~46s |
---------------------------------------------------------------
| single (v2)
| ~3m6s | ~14s |
~32s | ~1m2s | ~44s |
---------------------------------------------------------------
for info:
-BeeGFS
is a distributed FS (4 bricks,
2 bricks per server and 2
servers)
-
single (v2): simple gluster
volume with default settings
I also note I obtain
the same tar/untar performance
issue with FhGFS/BeeGFS but the
rest (DU, FIND, RM) looks like
to be OK.
Thank you very much
for your reply and help.
Geoffrey
-----------------------------------------------
Geoffrey Letessier
Responsable informatique
& ingénieur système
CNRS - UPR 9080 -
Laboratoire de Biochimie
Théorique
Institut de Biologie
Physico-Chimique
13, rue Pierre et Marie
Curie - 75005 Paris
Tel: 01 58 41 50 93 -
eMail: geoffrey.letessier@xxxxxxx
I am seeing
problems on 3.7 as well.
Can you check
/var/log/messages on both
the clients and servers
for hung tasks like:
Jun 2 15:23:14 gqac006
kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Jun 2 15:23:14 gqac006
kernel: iozone D
0000000000000001 0
21999 1 0x00000080
Jun 2 15:23:14 gqac006
kernel: ffff880611321cc8
0000000000000082
ffff880611321c18
ffffffffa027236e
Jun 2 15:23:14 gqac006
kernel: ffff880611321c48
ffffffffa0272c10
ffff88052bd1e040
ffff880611321c78
Jun 2 15:23:14 gqac006
kernel: ffff88052bd1e0f0
ffff88062080c7a0
ffff880625addaf8
ffff880611321fd8
Jun 2 15:23:14 gqac006
kernel: Call Trace:
Jun 2 15:23:14 gqac006
kernel:
[<ffffffffa027236e>]
?
rpc_make_runnable+0x7e/0x80
[sunrpc]
Jun 2 15:23:14 gqac006
kernel:
[<ffffffffa0272c10>]
? rpc_execute+0x50/0xa0
[sunrpc]
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff810aaa21>]
? ktime_get_ts+0xb1/0xf0
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff811242d0>]
? sync_page+0x0/0x50
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8152a1b3>]
io_schedule+0x73/0xc0
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8112430d>]
sync_page+0x3d/0x50
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8152ac7f>]
__wait_on_bit+0x5f/0x90
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff81124543>]
wait_on_page_bit+0x73/0x80
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8109eb80>]
?
wake_bit_function+0x0/0x50
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8113a525>]
?
pagevec_lookup_tag+0x25/0x40
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8112496b>]
wait_on_page_writeback_range+0xfb/0x190
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff81124b38>]
filemap_write_and_wait_range+0x78/0x90
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff811c07ce>]
vfs_fsync_range+0x7e/0x100
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff811c08bd>]
vfs_fsync+0x1d/0x20
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff811c08fe>]
do_fsync+0x3e/0x60
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff811c0950>]
sys_fsync+0x10/0x20
Jun 2 15:23:14 gqac006
kernel:
[<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b
Do you see a perf problem
with just a simple DD or
do you need a more complex
workload to hit the issue?
I think I saw an issue
with metadata performance
that I am trying to run
down, let me know if you
can see the problem with
simple DD reads / writes
or if we need to do some
sort of dir / metadata
access as well.
-b
----- Original Message
-----
From: "Geoffrey
Letessier" <geoffrey.letessier@xxxxxxx>
To: "Pranith Kumar
Karampuri" <pkarampu@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Tuesday, June 2,
2015 8:09:04 AM
Subject: Re:
GlusterFS 3.7 -
slow/poor performances
Hi Pranith,
I’m sorry but I cannot
bring you any comparison
because comparison will
be
distorted by the fact in
my HPC cluster in
production the network
technology
is InfiniBand QDR and my
volumes are quite
different (brick in
RAID6
(12x2TB), 2 bricks per
server and 4 servers
into my pool)
Concerning your demand,
in attachments you can
find all expected
results
hoping it can help you
to solve this serious
performance issue (maybe
I need
play with glusterfs
parameters?).
Thank you very much by
advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique
& ingénieur système
UPR 9080 - CNRS -
Laboratoire de Biochimie
Théorique
Institut de Biologie
Physico-Chimique
13, rue Pierre et Marie
Curie - 75005 Paris
Tel: 01 58 41 50 93 -
eMail: geoffrey.letessier@xxxxxxx
Le 2 juin 2015 à 10:09,
Pranith Kumar Karampuri
< pkarampu@xxxxxxxxxx >
a
écrit :
hi Geoffrey,
Since you are saying it
happens on all types of
volumes, lets do the
following:
1) Create a dist-repl
volume
2) Set the options etc
you need.
3) enable gluster volume
profile using "gluster
volume profile
<volname>
start"
4) run the work load
5) give output of
"gluster volume profile
<volname> info"
Repeat the steps above
on new and old version
you are comparing this
with.
That should give us
insight into what could
be causing the slowness.
Pranith
On 06/02/2015 03:22 AM,
Geoffrey Letessier
wrote:
Dear all,
I have a crash test
cluster where i’ve
tested the new version
of GlusterFS
(v3.7) before upgrading
my HPC cluster in
production.
But… all my tests show
me very very low
performances.
For my benches, as you
can read below, I do
some actions (untar, du,
find,
tar, rm) with linux
kernel sources, dropping
cache, each on
distributed,
replicated,
distributed-replicated,
single (single brick)
volumes and the
native FS of one brick.
# time (echo 3 >
/proc/sys/vm/drop_caches;
tar xJf
~/linux-4.1-rc5.tar.xz;
sync; echo 3 >
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches;
du -sh linux-4.1-rc5/;
echo 3 >
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches;
find linux-4.1-rc5/|wc
-l; echo 3
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches;
tar czf
linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3
>
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches;
rm -rf linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3
>
/proc/sys/vm/drop_caches)
And here are the process
times:
---------------------------------------------------------------
| | UNTAR | DU | FIND |
TAR | RM |
---------------------------------------------------------------
| single | ~3m45s | ~43s
| ~47s | ~3m10s | ~3m15s
|
---------------------------------------------------------------
| replicated | ~5m10s |
~59s | ~1m6s | ~1m19s |
~1m49s |
---------------------------------------------------------------
| distributed | ~4m18s |
~41s | ~57s | ~2m24s |
~1m38s |
---------------------------------------------------------------
| dist-repl | ~8m18s |
~1m4s | ~1m11s | ~1m24s
| ~2m40s |
---------------------------------------------------------------
| native FS | ~11s | ~4s
| ~2s | ~56s | ~10s |
---------------------------------------------------------------
I get the same results,
whether with default
configurations with
custom
configurations.
if I look at the side of
the ifstat command, I
can note my IO write
processes
never exceed 3MBs...
EXT4 native FS seems to
be faster (roughly
15-20% but no more) than
XFS one
My [test] storage
cluster config is
composed by 2 identical
servers (biCPU
Intel Xeon X5355, 8GB of
RAM, 2x2TB HDD (no-RAID)
and Gb ethernet)
My volume settings:
single: 1server 1 brick
replicated: 2 servers 1
brick each
distributed: 2 servers 2
bricks each
dist-repl: 2 bricks in
the same server and
replica 2
All seems to be OK in
gluster status command
line.
Do you have an idea why
I obtain so bad results?
Thanks in advance.
Geoffrey
-----------------------------------------------
Geoffrey Letessier
Responsable informatique
& ingénieur système
CNRS - UPR 9080 -
Laboratoire de Biochimie
Théorique
Institut de Biologie
Physico-Chimique
13, rue Pierre et Marie
Curie - 75005 Paris
Tel: 01 58 41 50 93 -
eMail: geoffrey.letessier@xxxxxxx
_______________________________________________
Gluster-users mailing
list Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing
list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|