Dear Mohammed Rafi, thanks for getting back to me! > If you have the problem still bugging you, or if you have any previous > logs that you can share with me, that will help to analyze further. I have collected the logs from the server and one client; it's a 21MB archive, how can I provide them? (I'm not sure how complete the collection is: unfortunately some time has already passed, so client nodes have been terminated and their logs have been lost. Also the issues were happening at the beginning of November, so some logs have simply been rotated out of existence now.) My reply to your questions is inline below. > On 11/17/2016 07:22 PM, Riccardo Murri wrote: > > Hello, > > > > we are trying out GlusterFS as the working filesystem for a compute cluster; > > the cluster is comprised of 57 compute nodes (55 cores each), acting as > > GlusterFS clients, and 25 data server nodes (8 cores each), serving > > 1 large GlusterFS brick each. > > > > We currently have noticed a couple of issues: > > > > 1) When compute jobs run, the `glusterfs` client process on the compute nodes > > goes up to 100% CPU, and filesystem operations start to slow down a lot. > > Since there are many CPUs available, is it possible to make it use, e.g., > > 4 CPUs instead of one to make it more responsive? > > Can you just briefly describe about your computing job, workloads to see > what are the operation happening on the cluster. We built a cluster with 47 compute nodes, each with 56 cores. The compute nodes were acting as GlusterFS clients (FUSE) to 25 GlusterFS servers, each with 8 cores and 32 GB of RAM. Each server was serving a single 10TB brick (ext4 formatted), for a grand total of 250TB. The compute nodes were running the "rockstar" [1] program, one job per node so about 45 jobs concurrently running[2], driven by a shell script that was performing a number of file existence probes while the main program was running, e.g. (PERL):: sleep 1 while (!(-e "auto-rockstar.cfg")); #wait for server to start Users of the cluster reported that many jobs failed or stalled because these existence tests were never succeeding, or files would disappear after having been created. [1]: https://bitbucket.org/gfcstanford/rockstar [2]: although one job could span many processes > > 2) In addition (but possibly related to 1) we have an issue with files > > disappearing and re-appearing: from a compute process we test for the existence > > of a file and e.g. `test -e /glusterfs/file.txt` fails. Then we test from > > a different process or shell and the file is there. As far as I can see, > > the servers are basically idle, and none of the peers is disconnected. > > > > We are running GlusterFS 3.7.17 on Ubuntu 16.04, installed from the Launchpad PPA. > > (Details below for the interested.) > > > > Can you give any hint about what's going on? > is there any rebalance happening, tell me more about any on going > operations (internal operations like rebalance, shd,etc or client > operations). If any rebalance happened, it was triggered automatically by the system. It might be relevant that at some point the free space dropped to 0 (too much output from the jobs), this might have thrown off some internal healing operation. Basically, the sequence of operations was like this: - create cluster - fill the `/glusterfs` with input data: ~200TB copied with `rsync`, no problems - start 1000 "rockstar" jobs, issues begin as jobs stall and never complete - reboot all GlusterFS servers and unmount/remount filesystem on clients, attempting to cure the problem - reduce amount of compute nodes to 10 (=560 cores), job failure rate decreases to an acceptable rate I could only get limited reports/data points from the users: they were in a hurry to process the data because of a deadline and did not want to sit down to debug the issue to the roots. I am still quite interested in sorting this problem out, as the same issue might resurface if we need to build a large cluster again. > Also some insight about your volume configuration will also help. volume > info and volume status. Here it is:: ubuntu@data001:~$ sudo gluster volume info Volume Name: glusterfs Type: Distribute Volume ID: fdca65bd-313c-47fa-8a09-222f794951ed Status: Started Number of Bricks: 25 Transport-type: tcp Bricks: Brick1: data001:/srv/glusterfs Brick2: data002:/srv/glusterfs Brick3: data003:/srv/glusterfs Brick4: data004:/srv/glusterfs Brick5: data005:/srv/glusterfs Brick6: data006:/srv/glusterfs Brick7: data007:/srv/glusterfs Brick8: data008:/srv/glusterfs Brick9: data009:/srv/glusterfs Brick10: data010:/srv/glusterfs Brick11: data011:/srv/glusterfs Brick12: data012:/srv/glusterfs Brick13: data013:/srv/glusterfs Brick14: data014:/srv/glusterfs Brick15: data015:/srv/glusterfs Brick16: data016:/srv/glusterfs Brick17: data017:/srv/glusterfs Brick18: data018:/srv/glusterfs Brick19: data019:/srv/glusterfs Brick20: data020:/srv/glusterfs Brick21: data021:/srv/glusterfs Brick22: data022:/srv/glusterfs Brick23: data023:/srv/glusterfs Brick24: data024:/srv/glusterfs Brick25: data025:/srv/glusterfs Options Reconfigured: performance.readdir-ahead: on ubuntu@data001:~$ sudo gluster volume status Status of volume: glusterfs Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick data001:/srv/glusterfs 49153 0 Y 1462 Brick data002:/srv/glusterfs 49153 0 Y 1459 Brick data003:/srv/glusterfs 49153 0 Y 1463 Brick data004:/srv/glusterfs 49153 0 Y 1460 Brick data005:/srv/glusterfs 49153 0 Y 1459 Brick data006:/srv/glusterfs 49153 0 Y 1748 Brick data007:/srv/glusterfs 49153 0 Y 1457 Brick data008:/srv/glusterfs 49153 0 Y 1498 Brick data009:/srv/glusterfs 49153 0 Y 1469 Brick data010:/srv/glusterfs 49153 0 Y 1489 Brick data011:/srv/glusterfs 49153 0 Y 1470 Brick data012:/srv/glusterfs 49153 0 Y 1458 Brick data013:/srv/glusterfs 49153 0 Y 1475 Brick data014:/srv/glusterfs 49153 0 Y 1464 Brick data015:/srv/glusterfs 49153 0 Y 1459 Brick data016:/srv/glusterfs 49153 0 Y 1465 Brick data017:/srv/glusterfs 49153 0 Y 1466 Brick data018:/srv/glusterfs 49153 0 Y 1467 Brick data019:/srv/glusterfs 49153 0 Y 1464 Brick data020:/srv/glusterfs 49153 0 Y 1460 Brick data021:/srv/glusterfs 49153 0 Y 1556 Brick data022:/srv/glusterfs 49153 0 Y 1458 Brick data023:/srv/glusterfs 49153 0 Y 1472 Brick data024:/srv/glusterfs 49153 0 Y 1767 Brick data025:/srv/glusterfs 49153 0 Y 1470 NFS Server on localhost 2049 0 Y 17383 NFS Server on data011 2049 0 Y 14638 NFS Server on data022 2049 0 Y 12485 NFS Server on data004 2049 0 Y 15197 NFS Server on data007 2049 0 Y 15006 NFS Server on data021 2049 0 Y 13631 NFS Server on data019 2049 0 Y 14421 NFS Server on data008 2049 0 Y 13506 NFS Server on data013 2049 0 Y 15965 NFS Server on data014 2049 0 Y 13231 NFS Server on data005 2049 0 Y 13370 NFS Server on data017 2049 0 Y 15316 NFS Server on data003 2049 0 Y 15359 NFS Server on data002 2049 0 Y 12681 NFS Server on data024 2049 0 Y 14263 NFS Server on data025 2049 0 Y 12560 NFS Server on data016 2049 0 Y 14761 NFS Server on data023 2049 0 Y 13165 NFS Server on data020 2049 0 Y 12769 NFS Server on data018 2049 0 Y 13789 NFS Server on data006 2049 0 Y 13429 NFS Server on data015 2049 0 Y 13423 NFS Server on data009 2049 0 Y 15343 NFS Server on data010 2049 0 Y 13189 NFS Server on data012 2049 0 Y 12690 Task Status of Volume glusterfs ------------------------------------------------------------------------------ There are no active volume tasks We build ephemeral clusters of VMs on an OpenStack infrastructure, that are destroyed once the batch of computations is done. The GlusterFS server configuration is done by Ansible:: https://github.com/gc3-uzh-ch/elasticluster/blob/master/elasticluster/share/playbooks/roles/glusterfs-server/tasks/export.yml This is the `/etc/glusterfs/glusterd.vol` generated as result:: volume management type mgmt/glusterd option working-directory /var/lib/glusterd option transport-type socket,rdma option transport.socket.keepalive-time 10 option transport.socket.keepalive-interval 2 option transport.socket.read-fail-log off option ping-timeout 0 option event-threads 1 # option base-port 49152 end-volume The GlusterFS clients simply do `mount -t glusterfs data001:/srv/glusterfs /glusterfs` Thanks for your help! Riccardo -- Riccardo Murri http://www.s3it.uzh.ch/about/team/#Riccardo.Murri S3IT: Services and Support for Science IT University of Zurich Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) Tel: +41 44 635 4208 Fax: +41 44 635 6888 _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users