Hello, users and devs.
TL;DR: One gluster client can essentially cause denial of service / availability loss to entire gluster array. There's no way to stop it and almost no way to find the bad client. Probably all (at least 3.6 and 3.7) versions are affected.
In either of these situations, one glusterfsd process on whatever peer the client is currently talking to will skyrocket to *nproc* cpu usage (800%, 1600%) and the storage cluster is essentially useless; all other clients will eventually try to read or write data to the overloaded peer and, when that happens, their connection will hang. Heals between peers hang because the load on the peer is around 1.5x the number of cores or more. This occurs in either gluster 3.6 or 3.7, is very repeatable, and happens much too frequently.
More importantly, though, there must be some feature envorced to stop one user from having the capability to render the entire filesystem unavailable for all other users. In the worst case, I would even prefer a gluster volume option that simply disconnects clients making over some threshold of file open requests. That's WAY more preferable than a complete availability loss reminiscent of a DDoS attack...
Apologies for the essay and looking forward to any help you can provide.
Thanks,
Patrick
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel