Hi, so on the last 2 days, I have been contacted by people because some builders were failling. Upon investigation ( https://bugzilla.redhat.co m/show_bug.cgi?id=1498390 ), the main issue seems to be the following: Each build failed had a set of glusterd process (around 300) that where started by jenkins, to test regression for this change: https://review.gluster.org/#/c/18271/ (found due to environment variable of the process) But upon closer inspection of the patch, it doesn't seems buggy, so my suspicion are on the test case, who is also quite simple (and likely bug free), but who also seems to start a ton of of volume (around 1000 ) if I am not wrong, and this do seems to result into a large number of process being created. See https://review.gluster.org/#/c/18271/5/tests/bugs/cli/bug-1490853.t Could it be that the test case do uncover a bug in the test suite, or a bug in gluster ? Looking at the test suite, I see that the cleanup function is conveniently ignoring a ton of error: https://github.com/gluster/glusterfs/blob/master/tests/include.rc#L465 which do not help to see what is going wrong. I also do not see out of memory errors, but munin graph seems to stop right at the same time, so maybe that's just a ressource issue. So my questions are: - is gluster supposed to scale gracefully with 1000 volumes on 1 single node ? - how much ressources should we plan for that ? (right now, we have 2G VM, we can't increase much without reinstalling the whole set of servers) If you see any builders not working, please ping me on irc. -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-devel