This is my first attempt at running Gluster, and so far it's not going well. I've got a cluster of 150 machines (this is in a university environment) that were previously all mounted to an NFS share on the cluster's head node. To make the cluster more expandable, and theoretically increase file I/O speeds, I decided to switch over to a distributed file system. I configured it with three storage nodes, 1 brick per node, running a Gluster in dispersed mode. Well, at first it seemed to be running fine, but then when I tested it with simultaneous reads/write it got really slow. If I run 'kash sleep 15', all 150 nodes will sleep for 15 seconds. If I create a file called runSleep that does teh same thing and then try to execute that file on all 150 nodes simultaneously, it will take 3-4 minutes to complete!
Here are a few things I've done to try to narrow the problem down:
- I unmounted the Gluster volume and re-mounted it as NFS instead of using fuse. Same results as before.
- I deleted the Gluster volume, created an NFS share on one of the storage nodes, them mounted that share on all of the compute nodes. This ran just fine with no noticeable delay at all.
- I created a new Gluster volume that's distributed, but only uses one brick. This ran just as slowly as my original case.
Running 'gluster volume profile' on the storage node from test 3, I noticed that the latency seems really high for teh last three operations listed, OPEN, LOOKUP, and FSYNC. FSYNC shows an average latency of 708230.64 us, or almost a full second.
The nodes are all running Ubuntu 16.04.
Any suggestions?
Thanks,
Kevin
Kevin
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users