Hi, Since i've upgraded GlusterFS from 3.5.3 to 3.7.x, trying to solve my quota miscalculation and poor performances (as advised by the user support team), we are still out-of-production for roughly 7 weeks because of a lot of v3.7.x issues we meet: - T-files apparition. I notice a lot of T files (with permissions --- --- --- T) located in my brick paths. Vijay has explained me T-files appear when a re-name is performed or when an add/remove brick is performed; but the problem is, since I've completely re-created (with RAID initialization, etc.) and import my data into the new volume, i've renamed nothing and never add nor delete any brick. So, why these T-files are present in my new volume??? For example, for my /derreumaux_team directory, I have 13891 real files and 704 T-files totalized in the brick paths… How to clean it, avoiding side effets? The first time I noticed this kind of files, it was after having set a quota under the real path size which has resulted in some quota explosions (quota daemon failure) and T-files apparitions... - 7006 files in split-brain status after having back transfert data (30TB, 6.2M files) from a backup server into my just created volume. Thanks to Mathieu Chateau who help me putting me on road (GFID vs real file path), this problem has been manually fixed. - log issue. After having created only one file (35GB), I can notice more than 186000 new lines in brick log files. I can stop them setting brick-log-level to CRITICAL but I guess this issue gravely impact the IO performances and throughput. Vijay told me having fixed this problem in the code but I apparently need to wait the new release to take advantage of… Very nice for the production! Actually, if I dont set brick-log-level to CRITICAL, i can fill my /var partition (10GB) in less than 1 day making some tests/benchs in the volume… - volume healing issue: slightly less than 14000 files was in a bad situation (# gluster volume heal <vol_home> info) and a new forced heal in my volume make no change. Thanks to Krutika and Pranith, this is problem is currently fixed. - du/df/stat/etc. hangs cause of RDMA protocol. This problem seems to not occur anymore since I’ve upgraded my GlusterFS v3.7.2 to v3.7.3. This was probably due to the brick crashes (after a few minutes or a few days after having [re]start the volume) with RDMA transport-type we had. I noticed it only with v3.7.2 version. - quota problem: after having forced (with success) the quota re-calculation (with a simple DU for each defined quotas), after a couple of days with good values, the quota daemon failed again (some quota explosions, etc.) - a lot of warnings in TAR operations on replicated volumes: tar: linux-4.1-rc6/sound/soc/codecs/wm8962.c : fichier modifié pendant sa lecture - low I/O performances and throughput: 1- if I enable to quota feature, my IO throughput is divided by 2. So, for the moment, i disabled this feature… (only since I’ve upgraded GlusterFS into 3.7.x version) 2- since I’ve upgraded GlusterFS from 3.5.3 to 3.7.3, my I/O performance and throughput is lower than before, as you can read below. (keeping in mind i’ve disable quota feature) IO operation tests with a Linux kernel archive (80MB tar ball file, ~53000 files, 550MB uncompressed): ------------------------------------------------------------------------ | PRODUCTION HARDWARE | ------------------------------------------------------------------------ | | UNTAR | DU | FIND | GREP | TAR | RM | ------------------------------------------------------------------------ | native FS | ~16s | ~0.1s | ~0.1s | ~0.1s | ~24s | ~3s | ------------------------------------------------------------------------ | GlusterFS version 3.5.3 | ------------------------------------------------------------------------ | distributed | ~2m57s | ~23s | ~22s | ~49s | ~50s | ~54s | ------------------------------------------------------------------------ | dist-repl | ~29m56s | ~1m5s | ~1m04s | ~1m32s | ~1m31s | ~2m40s | ------------------------------------------------------------------------ | GlusterFS version 3.7.3 | ------------------------------------------------------------------------ | distributed | ~2m49s | ~20s | ~29s | ~58s | ~60s | ~41s | ------------------------------------------------------------------------ | dist-repl | ~28m24s | ~51s | ~37s | ~1m16s | ~1m14s | ~1m17s | ------------------------------------------------------------------------ *: - distributed: 4 bricks (2 bricks on 2 servers) - dist-repl: 4 bricks (2 bricks on 2 servers) for each replicas, 2 replicas. - native FS: each brick path (XFS) And the craziest thing is I did the same test on a crashtest storage cluster (2 old Dell servers, all brick are single 2TB hard drive 7.2k, 2 bricks for each server) and the performance exceeds the production hardware performance (4 recent servers, 2 bricks each, each brick is 24TB RAID6 with good LSI RAID controllers (1 controller for 1 brick): ------------------------------------------------------------------------ | CRASHTEST HARDWARE | ------------------------------------------------------------------------ | | UNTAR | DU | FIND | GREP | TAR | RM | ------------------------------------------------------------------------ | native FS | ~19s | ~0.2s | ~0.1s | ~1.2s | ~29s | ~2s | ------------------------------------------------------------------------ ------------------------------------------------------------------------ | single | ~3m45s | ~43s | ~47s | | ~3m10s | ~3m15s | ------------------------------------------------------------------------ | single v2* | ~3m24s | ~13s | ~33s | ~1m10s | ~46s | ~48s | ------------------------------------------------------------------------ | single NFS | ~23m51s | ~3s | ~1s | ~27s | ~36s | ~13s | ------------------------------------------------------------------------ | replicated | ~5m10s | ~59s | ~1m6s | | ~1m19s | ~1m49s | ------------------------------------------------------------------------ | distributed | ~4m18s | ~41s | ~57s | | ~2m24s | ~1m38s | ------------------------------------------------------------------------ | dist-repl | ~7m1s | ~19s | ~31s | ~1m34s | ~1m26s | ~2m11s | ------------------------------------------------------------------------ | FhGFS(dist) | ~3m33s | ~15s | ~2s | ~1m31s | ~1m31s | ~52s | ------------------------------------------------------------------------ *: with default parameters Now, it is around 500-600MBs with RDMA and 150-300MBs with TCP (for dist-repl volume), and around 600-700MBs with RDMA and 500-600 with TCP for distributed volume. Could you help to back into production our HPC center, solving above-mentioned issues? Or do you advise me to downgrade into v3.5.3 (the more stable version I’d known since I use GlusterFS in production)? or move on ?;-) Thanks in advance. Geoffrey
------------------------------------------------------
Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users