Hi,
Yes of course: [root@lucifer ~]# pdsh -w cl-storage[1,3] du -s /export/brick_home/brick*/amyloid_team cl-storage1: 1608522280 /export/brick_home/brick1/amyloid_team cl-storage3: 1619630616 /export/brick_home/brick1/amyloid_team cl-storage1: 1614057836 /export/brick_home/brick2/amyloid_team cl-storage3: 1602653808 /export/brick_home/brick2/amyloid_team
The sum is: 6444864540 (around 6.4-6.5TB) while the quota list displays 7.7TB. So, the mistake is roughly 1.2-1.3TB, in other words around 16% -which is too huge, no?
In addition, since the quota is exceeded, i note a lot of files like following: [root@lucifer ~]# pdsh -w cl-storage[1,3] "cd /export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/; ls -ail remd_100.sh 2> /dev/null" 2>/dev/null cl-storage3: 133325688 ---------T 2 tarus amyloid_team 0 16 févr. 10:20 remd_100.sh note the ’T’ at the end of perms and the file size to 0B.
And, yesterday, some files were duplicated but not anymore...
The worst is, previously, all these files were OK. In other words, exceeding quota made file or content deletions or corruptions… What can I do to prevent to situation for the futur -because I guess i cannot do something to rollback this situation now, right?
Geoffrey ------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
On Monday 08 June 2015 07:11 PM,
Geoffrey Letessier wrote:
In addition, i notice a very big difference between the sum of DU
on each brick and « quota list » display, as you can read below:
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh
/export/brick_home/brick*/amyloid_team
cl-storage1: 1,6T /export/brick_home/brick1/amyloid_team
cl-storage3: 1,6T /export/brick_home/brick1/amyloid_team
cl-storage1: 1,6T /export/brick_home/brick2/amyloid_team
cl-storage3: 1,6T /export/brick_home/brick2/amyloid_team
[root@lucifer ~]# gluster volume quota vol_home list
/amyloid_team
Path Hard-limit
Soft-limit Used Available
--------------------------------------------------------------------------------
/amyloid_team 9.0TB
90% 7.8TB 1.2TB
As you can notice, the sum of all bricks gives me
roughly 6.4TB and « quota list » around 7.8TB; so there is a
difference of 1.4TB i’m not able to explain… Do you have any
idea?
There were few issues when quota accounting the size,
we have fixed some of these issues in 3.7
'df -h' will round off the values, can you please
provide the output of 'df' without -h option?
Thanks,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hello,
Concerning the 3.5.3 version of GlusterFS,
I met this morning a strange issue writing file when
quota is exceeded.
One person of my lab, whose her quota is
exceeded (but she didn’t know about) try to modify a
file but, because of exceeded quota, she was unable to
and decided to exit VI. Now, her file is empty/blank
as you can read below:
we suspect 'vi' might have created tmp file before writing to a
file. We are working on re-creating this problem and will update
you on the same.
pdsh@lucifer: cl-storage3:
ssh exited with exit code 2
cl-storage1: ---------T 2
tarus amyloid_team 0 19 févr. 12:34
/export/brick_home/brick1/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
cl-storage1: -rwxrw-r-- 2
tarus amyloid_team 0 8 juin 12:38
/export/brick_home/brick2/amyloid_team/tarus/project/ab1-40-x1_sen304-x2_inh3-x2/remd_charmm22star_scripts/remd_115.sh
In addition, i dont understand why, my
volume being a distributed volume inside replica
(cl-storage[1,3] is replicated only on
cl-storage[2,4]), i have 2 « same » files (complete
path) in 2 different bricks (as you can read above).
Thanks by advance for your help and
clarification.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de
Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hi Ben,
I just check my messages log
files, both on client and server, and I dont
find any hung task you notice on yours..
As you can read below, i dont
note the performance issue in a simple DD
but I think my issue is concerning a set of
small files (tens of thousands nay more)…
[root@nisus test]# ddt -t 10g
/mnt/test/
Writing to /mnt/test/ddt.8362
... syncing ... done.
sleeping 10 seconds ... done.
Reading from /mnt/test/ddt.8362
... done.
10240MiB KiB/s CPU%
Write 114770 4
Read 40675 4
for info: /mnt/test concerns
the single v2 GlFS volume
[root@nisus test]# ddt -t 10g
/mnt/fhgfs/
Writing to /mnt/fhgfs/ddt.8380
... syncing ... done.
sleeping 10 seconds ... done.
Reading from
/mnt/fhgfs/ddt.8380 ... done.
10240MiB KiB/s CPU%
Write 102591 1
Read 98079 2
Do you have a idea how to
tune/optimize performance settings? and/or
TCP settings (MTU, etc.)?
---------------------------------------------------------------
| | UNTAR | DU
| FIND | TAR | RM |
---------------------------------------------------------------
| single | ~3m45s |
~43s | ~47s | ~3m10s | ~3m15s |
---------------------------------------------------------------
| replicated | ~5m10s |
~59s | ~1m6s | ~1m19s | ~1m49s |
---------------------------------------------------------------
| distributed | ~4m18s |
~41s | ~57s | ~2m24s | ~1m38s |
---------------------------------------------------------------
| dist-repl | ~8m18s |
~1m4s | ~1m11s | ~1m24s | ~2m40s |
---------------------------------------------------------------
| native FS | ~11s |
~4s | ~2s | ~56s | ~10s |
---------------------------------------------------------------
| BeeGFS | ~3m43s |
~15s | ~3s | ~1m33s | ~46s |
---------------------------------------------------------------
| single (v2) | ~3m6s |
~14s | ~32s | ~1m2s | ~44s |
---------------------------------------------------------------
for info:
-BeeGFS
is a distributed FS (4 bricks, 2 bricks
per server and 2 servers)
-
single (v2): simple gluster volume with
default settings
I also note I obtain the same
tar/untar performance issue with
FhGFS/BeeGFS but the rest (DU, FIND, RM)
looks like to be OK.
Thank you very much for your
reply and help.
Geoffrey
-----------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur
système
CNRS - UPR 9080 - Laboratoire
de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie -
75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
I am seeing problems on 3.7
as well. Can you check
/var/log/messages on both the clients
and servers for hung tasks like:
Jun 2 15:23:14 gqac006 kernel: "echo
0 >
/proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Jun 2 15:23:14 gqac006 kernel: iozone
D 0000000000000001 0 21999
1 0x00000080
Jun 2 15:23:14 gqac006 kernel:
ffff880611321cc8 0000000000000082
ffff880611321c18 ffffffffa027236e
Jun 2 15:23:14 gqac006 kernel:
ffff880611321c48 ffffffffa0272c10
ffff88052bd1e040 ffff880611321c78
Jun 2 15:23:14 gqac006 kernel:
ffff88052bd1e0f0 ffff88062080c7a0
ffff880625addaf8 ffff880611321fd8
Jun 2 15:23:14 gqac006 kernel: Call
Trace:
Jun 2 15:23:14 gqac006 kernel:
[<ffffffffa027236e>] ?
rpc_make_runnable+0x7e/0x80 [sunrpc]
Jun 2 15:23:14 gqac006 kernel:
[<ffffffffa0272c10>] ?
rpc_execute+0x50/0xa0 [sunrpc]
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff810aaa21>] ?
ktime_get_ts+0xb1/0xf0
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff811242d0>] ?
sync_page+0x0/0x50
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8152a1b3>]
io_schedule+0x73/0xc0
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8112430d>]
sync_page+0x3d/0x50
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8152ac7f>]
__wait_on_bit+0x5f/0x90
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff81124543>]
wait_on_page_bit+0x73/0x80
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8109eb80>] ?
wake_bit_function+0x0/0x50
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8113a525>] ?
pagevec_lookup_tag+0x25/0x40
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8112496b>]
wait_on_page_writeback_range+0xfb/0x190
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff81124b38>]
filemap_write_and_wait_range+0x78/0x90
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff811c07ce>]
vfs_fsync_range+0x7e/0x100
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff811c08bd>]
vfs_fsync+0x1d/0x20
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff811c08fe>]
do_fsync+0x3e/0x60
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff811c0950>]
sys_fsync+0x10/0x20
Jun 2 15:23:14 gqac006 kernel:
[<ffffffff8100b072>]
system_call_fastpath+0x16/0x1b
Do you see a perf problem with just a
simple DD or do you need a more
complex workload to hit the issue? I
think I saw an issue with metadata
performance that I am trying to run
down, let me know if you can see the
problem with simple DD reads / writes
or if we need to do some sort of dir /
metadata access as well.
-b
----- Original Message -----
From:
"Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
Cc: gluster-users@xxxxxxxxxxx
Sent: Tuesday, June 2, 2015 8:09:04
AM
Subject: Re:
GlusterFS 3.7 - slow/poor
performances
Hi Pranith,
I’m sorry but I cannot bring you any
comparison because comparison will
be
distorted by the fact in my HPC
cluster in production the network
technology
is InfiniBand QDR and my volumes are
quite different (brick in RAID6
(12x2TB), 2 bricks per server and 4
servers into my pool)
Concerning your demand, in
attachments you can find all
expected results
hoping it can help you to solve this
serious performance issue (maybe I
need
play with glusterfs parameters?).
Thank you very much by advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique &
ingénieur système
UPR 9080 - CNRS - Laboratoire de
Biochimie Théorique
Institut de Biologie
Physico-Chimique
13, rue Pierre et Marie Curie -
75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Le 2 juin 2015 à 10:09, Pranith
Kumar Karampuri < pkarampu@xxxxxxxxxx
> a
écrit :
hi Geoffrey,
Since you are saying it happens on
all types of volumes, lets do the
following:
1) Create a dist-repl volume
2) Set the options etc you need.
3) enable gluster volume profile
using "gluster volume profile
<volname>
start"
4) run the work load
5) give output of "gluster volume
profile <volname> info"
Repeat the steps above on new and
old version you are comparing this
with.
That should give us insight into
what could be causing the slowness.
Pranith
On 06/02/2015 03:22 AM, Geoffrey
Letessier wrote:
Dear all,
I have a crash test cluster where
i’ve tested the new version of
GlusterFS
(v3.7) before upgrading my HPC
cluster in production.
But… all my tests show me very very
low performances.
For my benches, as you can read
below, I do some actions (untar, du,
find,
tar, rm) with linux kernel sources,
dropping cache, each on distributed,
replicated, distributed-replicated,
single (single brick) volumes and
the
native FS of one brick.
# time (echo 3 >
/proc/sys/vm/drop_caches; tar xJf
~/linux-4.1-rc5.tar.xz;
sync; echo 3 >
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches; du -sh
linux-4.1-rc5/; echo 3 >
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches; find
linux-4.1-rc5/|wc -l; echo 3
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches; tar czf
linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3 >
/proc/sys/vm/drop_caches)
# time (echo 3 >
/proc/sys/vm/drop_caches; rm -rf
linux-4.1-rc5.tgz
linux-4.1-rc5/; echo 3 >
/proc/sys/vm/drop_caches)
And here are the process times:
---------------------------------------------------------------
| | UNTAR | DU | FIND | TAR | RM |
---------------------------------------------------------------
| single | ~3m45s | ~43s | ~47s |
~3m10s | ~3m15s |
---------------------------------------------------------------
| replicated | ~5m10s | ~59s | ~1m6s
| ~1m19s | ~1m49s |
---------------------------------------------------------------
| distributed | ~4m18s | ~41s | ~57s
| ~2m24s | ~1m38s |
---------------------------------------------------------------
| dist-repl | ~8m18s | ~1m4s |
~1m11s | ~1m24s | ~2m40s |
---------------------------------------------------------------
| native FS | ~11s | ~4s | ~2s |
~56s | ~10s |
---------------------------------------------------------------
I get the same results, whether with
default configurations with custom
configurations.
if I look at the side of the ifstat
command, I can note my IO write
processes
never exceed 3MBs...
EXT4 native FS seems to be faster
(roughly 15-20% but no more) than
XFS one
My [test] storage cluster config is
composed by 2 identical servers
(biCPU
Intel Xeon X5355, 8GB of RAM, 2x2TB
HDD (no-RAID) and Gb ethernet)
My volume settings:
single: 1server 1 brick
replicated: 2 servers 1 brick each
distributed: 2 servers 2 bricks each
dist-repl: 2 bricks in the same
server and replica 2
All seems to be OK in gluster status
command line.
Do you have an idea why I obtain so
bad results?
Thanks in advance.
Geoffrey
-----------------------------------------------
Geoffrey Letessier
Responsable informatique &
ingénieur système
CNRS - UPR 9080 - Laboratoire de
Biochimie Théorique
Institut de Biologie
Physico-Chimique
13, rue Pierre et Marie Curie -
75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
_______________________________________________
Gluster-users mailing list Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users
|