Re: GlusterFS 3.5.3 - untar: very poor performance

Geoffrey Letessier <geoffrey.letessier@xxxxxxx> · Sun, 28 Jun 2015 10:04:09 +0200

Hello,
@Krutika: Thanks for transferring my issue.

Everything is becoming completely crazy; other quotas are exploding. Indeed, after having remove my previous quota in failure, some other quotas have grown up as you can read below:

[root@lucifer ~]# gluster volume quota vol_home list
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/baaden_team                              20.0TB       90%      15.1TB   4.9TB
/sterpone_team                            14.0TB       90%      25.5TB  0Bytes
/simlab_team                               5.0TB       90%       1.3TB   3.7TB
/sacquin_team                             10.0TB       90%       8.3TB   1.7TB
/admin_team                                1.0TB       90%      17.0GB 1007.0GB
/amyloid_team                              7.0TB       90%       6.4TB 577.5GB
/amyloid_team/nguyen                       4.0TB       90%       3.7TB 312.7GB

[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/sterpone_team
cl-storage1: 3,1T	/export/brick_home/brick1/sterpone_team
cl-storage1: 2,3T	/export/brick_home/brick2/sterpone_team
cl-storage3: 2,7T	/export/brick_home/brick1/sterpone_team
cl-storage3: 2,9T	/export/brick_home/brick2/sterpone_team
=> ~11TB (not 25.5TB!!!)

[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/baaden_team
cl-storage1: 4,2T	/export/brick_home/brick1/baaden_team
cl-storage3: 3,7T	/export/brick_home/brick1/baaden_team
cl-storage1: 3,6T	/export/brick_home/brick2/baaden_team
cl-storage3: 3,5T	/export/brick_home/brick2/baaden_team
=> ~15TB (not 14TB).

Etc.

Do you please help me to urgently solve this issue because this situation is blocking and I must stop the production until.

Do you think upgrading storage cluster into 3.7.1 (the latest) version of GlusterFS could fix the problem?

Thanks by advance,
Geoffrey

------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 27 juin 2015 à 08:13, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

Copying Vijai and Raghavendra for help...

-Krutika
From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Sent: Saturday, June 27, 2015 2:13:52 AM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

Hi Krutika,
Since I have re-enabled the quota feature on my volume vol_home, one defined quota is become like crazy… And it’s a very very very big problem for us.

During all the day, after having re-enabled it, i noted the used space detected growing up (without any user IO on)..

[root@lucifer ~]# gluster volume quota vol_home list|grep derreumaux_team
/derreumaux_team                          14.0TB       80%      13.7TB 357.2GB
[root@lucifer ~]# gluster volume quota vol_home list /derreumaux_team
                  Path                   Hard-limit Soft-limit   Used  Available
--------------------------------------------------------------------------------
/derreumaux_team                          14.0TB       80%      13.1TB 874.1GB
[root@lucifer ~]# pdsh -w cl-storage[1,3] du -sh /export/brick_home/brick*/derreumaux_team
cl-storage3: 590G /export/brick_home/brick1/derreumaux_team
cl-storage3: 611G /export/brick_home/brick2/derreumaux_team
cl-storage1: 567G /export/brick_home/brick1/derreumaux_team
cl-storage1: 564G /export/brick_home/brick2/derreumaux_team

As you can see in these 3 command lines, i obtain 3 different results but, the worse, it’s quota system est very very far from the real disk used space (13.7TB <> 13.1TB <<>> 2.3TB).

Can you please help to fix it very quickly because all this group is completely block by exceeded quota.

Thank you so much by advance,
Have a nice week-end,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 26 juin 2015 à 10:29, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

No but if you are saying it is 3.5.3 rpm version, then that bug does not exist there.
And still it is weird how you are seeing such bad performance. :-/
Anything suspicious in the logs?

-Krutika
From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Sent: Friday, June 26, 2015 1:27:16 PM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

No , it’s the 3.5.3 RPMS version if found on your reposity (published on novembre 2014).So, you suggest me to simply upgrade all servers and clients with the new 3.5.4 version? Wouldn't it be better to upgrade all the system (servers and clients) to the 3.7.1 version?

Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 26 juin 2015 à 09:03, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

Also, so are you running 3.5.3 rpms on the clients? Or is it a patched version with more fixes on top of 3.5.3?
The reason I ask this is because there was one performance issue introduced after 3.5.3 and fixed by 3.5.4 in replication module. I'm wondering if that could be causing the issue you experience.

-Krutika
From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Sent: Friday, June 26, 2015 10:05:26 AM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

Hi Krutika,
Oops, I disable quota manager without saving configuration. Could you tell me how to retrieve quota list information?

I’m gonna test the untar in the meantime.

Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 26 juin 2015 à 04:56, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

Hi,

So i tried out kernel src tree untar locally on a plain replicate (1x2) volume and it took me 7m30sec on an average. This was on vms and there was no rdma and there was no quota enabled.
Could you try the same thing on a volume without quota to see if it makes a difference to the perf?

-Krutika

From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Sent: Wednesday, June 24, 2015 10:21:13 AM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

Hi Krutika,
OK, thank you very much by advance.
Concerning quota system, are you in touch with Vijaykumar? Because I’m still waiting for a answer since a couple of days, nay more.

One more time, thank you.
Have a nice day (in France it’s 6:50 AM).
Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 24 juin 2015 à 05:55, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

Ok so for anything related to replication, I could help you out.
But for quota, it would be better to ask Vijaikumar Mallikarjuna or Raghavendra G on the mailing list.
I used to work on quota, long time back. But now I am not in touch with the component anymore and do not know of the latest changes to it.
For the performance issue, I will try linux kernel src untar on my machines and let you know what I find.

-Krutika

From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Sent: Monday, June 22, 2015 9:00:52 PM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

Hi Krutika,
Sorry for the delay but i was in meeting all the day. 

Good to hear from you as well. :)
;-)
So you are seeing this bad performance only in 3.5.3? Any other releases you tried this test on, where the results were much better with replication?
Yes but I’m not sure my issue is only concerning this specific release. A few days ago, the untar process (with the same version of GlusterFS) took around 8 minutes, now around 32 minutes. 8 was too much but what about 32 minutes? :)

That said, my matter is only concerning small files because if i play with dd (or other) with only 1 big file all is OK (client write throughput: ~1GBs => ~500MBs in each replica)

If i run my bench on my only distributed volume i get a good performance (untar: ~1m44s, etc.)..

In addition, i dunno if it can be important, I have some troubles with GlusterFS group quota: there are a lot of conflicts between quota size and actual file size which dont match and a lot of "quota xattrs not found" messages with quota-verify glusterfs app. -you can find in attachment an extract of quota-verify outputs. 

If so, could you please let me know? Meanwhile let me try the untar myself on my vms to see what could be causing the perf issue.
OK, thanks. 

See you,
Geoffrey
------------------------------------------------------
Geoffrey Letessier
Responsable informatique & ingénieur système
UPR 9080 - CNRS - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 22 juin 2015 à 11:35, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

Hi Geoffrey,

Good to hear from you as well. :)
Ok so you say disabling write-behind does not help. Makes me wonder what the problem could be.
So you are seeing this bad performance only in 3.5.3? Any other releases you tried this test on, where the results were much better with replication?
If so, could you please let me know? Meanwhile let me try the untar myself on my vms to see what could be causing the perf issue.

-Krutika

From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Sent: Monday, June 22, 2015 10:14:26 AM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

Hi Krutika,
It’s good to read you again :)

Here are my answers:
1- could you remind me how to know if self-heal is currently in progress? I dont note any special neither mount-point (except /var/run/gluster/vol_home one) nor dedicated process; but maybe i look in the wrong place..
2- OK, I just disabled write-behind parameter and rerun the bench. I’ll let you know more about when I will arrive at my office (I’m still at home at this time).

See you and thanks you for helping. 
Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 22 juin 2015 à 04:35, Krutika Dhananjay <kdhananj@xxxxxxxxxx> a écrit :

Hi Geoffrey,

1. Was self-heal also in progress while I/O was happening on the volume?
2. Also, there seem to be quite a few fsyncs which could possibly have slowed things down a bit. Could you disable write-behind and try
    getting the time stats one more time to eliminate the possibility of write-behind's presence causing out-of-order writes to increase the number of fsyncs
    by the replication module.

-Krutika
From: "Geoffrey Letessier" <geoffrey.letessier@xxxxxxx>
To: gluster-users@xxxxxxxxxxx
Sent: Saturday, June 20, 2015 6:04:40 AM
Subject: Re:  GlusterFS 3.5.3 - untar: very poor performance

Re,
For comparison, here is the output of the same script run on a distributed only volume (2 servers of the 4 previously described, 2 bricks each):#######################################################
################  UNTAR time consumed  ################
#######################################################

real 1m44.698s
user 0m8.891s
sys 0m8.353s

#######################################################
#################  DU time consumed  ##################
#######################################################

554M linux-4.1-rc6

real 0m21.062s
user 0m0.100s
sys 0m1.040s

#######################################################
#################  FIND time consumed  ################
#######################################################

52663

real 0m21.325s
user 0m0.104s
sys 0m1.054s

#######################################################
#################  GREP time consumed  ################
#######################################################

7952

real 0m43.618s
user 0m0.922s
sys 0m3.626s

#######################################################
#################  TAR time consumed  #################
#######################################################

real 0m50.577s
user 0m29.745s
sys 0m4.086s

#######################################################
#################  RM time consumed  ##################
#######################################################

real 0m41.133s
user 0m0.171s
sys 0m2.522s

The performances are amazing different!

Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx

Le 20 juin 2015 à 02:12, Geoffrey Letessier <geoffrey.letessier@xxxxxxx> a écrit :

Dear all,
I just noticed on my main volume of my HPC cluster my IO operations become impressively poor.. 

Doing some file operations above a linux kernel sources compressed file, the untar operation can take more than 1/2 hours for this file (roughly 80MB and 52 000 files inside) as you read below:
#######################################################
################  UNTAR time consumed  ################
#######################################################

real 32m42.967s
user 0m11.783s
sys 0m15.050s

#######################################################
#################  DU time consumed  ##################
#######################################################

557M linux-4.1-rc6

real 0m25.060s
user 0m0.068s
sys 0m0.344s

#######################################################
#################  FIND time consumed  ################
#######################################################

52663

real 0m25.687s
user 0m0.084s
sys 0m0.387s

#######################################################
#################  GREP time consumed  ################
#######################################################

7952

real 2m15.890s
user 0m0.887s
sys 0m2.777s

#######################################################
#################  TAR time consumed  #################
#######################################################

real 1m5.551s
user 0m26.536s
sys 0m2.609s

#######################################################
#################  RM time consumed  ##################
#######################################################

real 2m51.485s
user 0m0.167s
sys 0m1.663s

For information, this volume is a distributed replicated one and is composed by 4 servers with 2 bricks each. Each bricks is a 12-drives RAID6 vdisk with nice native performances (around 1.2GBs).

In comparison, when I use DD to generate a 100GB file on the same volume, my write throughput is around 1GB (client side) and 500MBs (server side) because of replication:
Client side:
[root@node056 ~]# ifstat -i ib0
       ib0        
 KB/s in  KB/s out
 3251.45  1.09e+06
 3139.80  1.05e+06
 3185.29  1.06e+06
 3293.84  1.09e+06
...

Server side:
[root@lucifer ~]# ifstat -i ib0
       ib0        
 KB/s in  KB/s out
561818.1   1746.42
560020.3   1737.92
526337.1   1648.20
513972.7   1613.69
...

DD command:
[root@node056 ~]# dd if=/dev/zero of=/home/root/test.dd bs=1M count=100000
100000+0 enregistrements lus
100000+0 enregistrements écrits
104857600000 octets (105 GB) copiés, 202,99 s, 517 MB/s

So this issue doesn’t seem coming from the network (which is Infiniband technology in this case)

You can find in attachments a set of files:
 - mybench.sh: the bench script
 - benches.txt: output of my "bench"
 - profile.txt: gluster volume profile during the "bench"
 - vol_status.txt: gluster volume status
 - vol_info.txt: gluster volume info

Can someone help me to fix it (it’s very critical because this volume is on a HPC cluster in production).

Thanks by advance,
Geoffrey
-----------------------------------------------
Geoffrey Letessier

Responsable informatique & ingénieur système
CNRS - UPR 9080 - Laboratoire de Biochimie Théorique
Institut de Biologie Physico-Chimique
13, rue Pierre et Marie Curie - 75005 Paris
Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx
<benches.txt>
<mybench.sh>
<profile.txt>
<vol_info.txt>
<vol_status.txt>

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users