I’m not really sure to well understand your answer.
I try to set inode-lru-limit to 1, I can not notice any good effect.
When i re-run ddt application, I can note 2 kinds of messages: [2015-08-07 21:29:21.792156] W [marker-quota.c:3379:_mq_initiate_quota_txn] 0-vol_home-marker: parent is NULL for <gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn [2015-08-07 21:29:21.792176] W [marker-quota.c:3379:_mq_initiate_quota_txn] 0-vol_home-marker: parent is NULL for <gfid:5a32328a-7fd9-474e-9bc6-cafde9c41af7>, aborting updation txn
and/or: [2015-08-07 21:44:19.279971] E [marker-quota.c:2990:mq_start_quota_txn_v2] 0-vol_home-marker: contribution node list is empty (31d7bf88-b63a-4731-a737-a3dce73b8cd1) [2015-08-07 21:41:26.177095] E [dict.c:1418:dict_copy_with_ref] (-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60) [0x7f85e9a6a410] -->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88) [0x7f85e9a6a188] -->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4) [0x3e99c20674] ) 0-dict: invalid argument: dict [Argument invalide]
And concerning the bad IO performance?
[letessier@node031 ~]$ ddt -t 35g /home/admin_team/letessier/ Writing to /home/admin_team/letessier/ddt.25259 ... syncing ... done. sleeping 10 seconds ... done. Reading from /home/admin_team/letessier/ddt.25259 ... done. 35840MiB KiB/s CPU% Write 277451 3 Read 188682 1 [letessier@node031 ~]$ logout [root@node031 ~]# ddt -t 35g /home/ Writing to /home/ddt.25559 ... syncing ... done. sleeping 10 seconds ... done. Reading from /home/ddt.25559 ... done. 35840MiB KiB/s CPU% Write 196539 2 Read 438944 3 Notice the read/write throughput differences when i’m root and when i’m a simple user.
Thanks. Geoffrey
------------------------------------------------------ Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
On Friday 07 August 2015 05:34 PM,
Geoffrey Letessier wrote:
Hi Vijay,
My brick logs issue and big performance problem have begun
when I upgraded Gluster into 3.7.3 version; before write
throughput was good enough (~500MBs) -but not as good as with
GlusterFS 3.5.3 (especially with distributed volumes)- and
didn’t notice these problème with brick-logs.
OK… in live:
i just disable to quota for my home volume and now my
performance appears to be relatively better (around 300MBs) but
i still see the logs (from storage1 and its replicate storage2)
growing up with only this kind of lines:
[2015-08-07
11:16:51.746142] E [dict.c:1418:dict_copy_with_ref]
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
[0x7f85e9a6a410]
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
[0x7f85e9a6a188]
-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
[0x3e99c20674] ) 0-dict: invalid argument: dict [Argument
invalide]
We have root caused log issue, bug# 1244613 tracks this issue
After a few minutes: my write throughput seems to be now
correct (~550MBs) but the log are still growing up (to not say
exploding). So one part of the problem looks like taking its
origin in the quota system management.
… after a few minutes (and still only 1 client connected),
now it is the read operation which is very very slow… -I’m gonna
become crazy! :/-
# ddt -t 50g /home/
Writing to /home/ddt.11293 ...
syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.11293 ...
done.
35840MiB KiB/s CPU%
Write 568201 5
Read 567008 4
# ddt -t 50g /home/
Writing to /home/ddt.11397 ...
syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.11397 ...
done.
51200MiB KiB/s CPU%
Write 573631 5
Read 164716 1
and my log are still exploding…
After having re-enabled the quota on my volume:
# ddt -t 50g /home/
Writing to /home/ddt.11817 ...
syncing ... done.
sleeping 10 seconds ... done.
Reading from /home/ddt.11817 ...
done.
51200MiB KiB/s CPU%
Write 269608 3
Read 160219 1
Thanks
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hi Geoffrey,
Some performance improvements has been done in
quota in glusterfs-3.7.3.
Could you upgrade to glusterfs-3.7.3 and see if
this helps
Thanks,
Vijay
On Friday 07 August 2015 05:02
AM, Geoffrey Letessier wrote:
Hi,
No idea to help me fix this issue? (big logs, small
write performance (/4), etc.)
For comparison, here to volumes:
- home: distributed on 4 bricks / 2 nodes (and
replicated on 4 other bricks / 2 other nodes):
# ddt -t 35g /home
Writing to /home/ddt.24172
... syncing ... done.
sleeping 10 seconds ...
done.
Reading from /home/ddt.24172
... done.
33792MiB KiB/s CPU%
Write 103659 1
Read 391955 3
- workdir: distributed on 4 bricks / 2 nodes (one
the same RAID volumes and servers than home):
# ddt -t 35g /workdir
Writing to
/workdir/ddt.24717 ... syncing ... done.
sleeping 10 seconds ...
done.
Reading from
/workdir/ddt.24717 ... done.
35840MiB KiB/s CPU%
Write 738314 4
Read 536497 4
For information, previously on 3.5.3-2 version, I
obtained roughly 1.1GBs for workdir volume and
~550-600MBs for home.
All my tests (CP, RSYNC, etc.) provides me the same
result (write throughput between 100MBs and 150MBs)
Thanks.
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hello,
In addition, knowing I have reactivated the
log (brick-log-level = INFO not CRITICAL) only
for the file creation duration (i.e. a few
minutes), do you have noticed the log sizes and
the number of lines inside:
#
ls -lh storage*
-rw-------
1 letessier staff 18M 5 aoû 00:54
storage1__export-brick_home-brick1-data.log
-rw-------
1 letessier staff 2,1K 5 aoû 00:54
storage1__export-brick_home-brick2-data.log
-rw-------
1 letessier staff 15M 5 aoû 00:56
storage2__export-brick_home-brick1-data.log
-rw-------
1 letessier staff 2,1K 5 aoû 00:54
storage2__export-brick_home-brick2-data.log
-rw-------
1 letessier staff 47M 5 aoû 00:55
storage3__export-brick_home-brick1-data.log
-rw-------
1 letessier staff 2,1K 5 aoû 00:54
storage3__export-brick_home-brick2-data.log
-rw-------
1 letessier staff 47M 5 aoû 00:55
storage4__export-brick_home-brick1-data.log
-rw-------
1 letessier staff 2,1K 5 aoû 00:55
storage4__export-brick_home-brick2-data.log
#
wc -l storage*
55381
storage1__export-brick_home-brick1-data.log
17
storage1__export-brick_home-brick2-data.log
41636
storage2__export-brick_home-brick1-data.log
17
storage2__export-brick_home-brick2-data.log
270360
storage3__export-brick_home-brick1-data.log
17
storage3__export-brick_home-brick2-data.log
270358
storage4__export-brick_home-brick1-data.log
17
storage4__export-brick_home-brick2-data.log
637803
total
If the let brick-log-level to INFO, the
brick log files in each server will consume
all my /var partition capacity within only a
few hours/days…
Thanks in advance,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique &
ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie
Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
Hello,
Since the problem motioned previously
(all errors noticed in brick log files),
i notice a very very bad performance: i
can note my write performance divided by
4 than previously -knowing it was not so
good before.
Now, a write of a 33GB file, my write
throughput is around 150MBs (with
Infiniband), before it was around
550-600MBs; and this, both with RDMA and
TCP protocol.
During this test, more than 40 000
error lines (as the following) were
added to the brick log files.
[2015-08-04
22:34:27.337622] E
[dict.c:1418:dict_copy_with_ref]
(-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(server_resolve_inode+0x60)
[0x7f021c6f7410]
-->/usr/lib64/glusterfs/3.7.3/xlator/protocol/server.so(resolve_gfid+0x88)
[0x7f021c6f7188]
-->/usr/lib64/libglusterfs.so.0(dict_copy_with_ref+0xa4)
[0x7f0229cba674] ) 0-dict: invalid
argument: dict [Argument invalide]
All brick log files are in
attachments.
Thanks in advance for all your help
and fix,
Best,
Geoffrey
PS: question: is it possible to
easily downgrade GlusterFS to a previous
version from 3.7 (for example: v3.5)?
------------------------------------------------------
Geoffrey Letessier
Responsable informatique &
ingénieur système
UPR 9080 - CNRS - Laboratoire
de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie -
75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
<bricks-logs.tgz>
|