Gluster builder being hit by too much process

Michael Scherer <mscherer@xxxxxxxxxx> · Fri, 06 Oct 2017 11:01:19 +0200

Hi,

so on the last 2 days, I have been contacted by people because some
builders were failling. Upon investigation ( https://bugzilla.redhat.co
m/show_bug.cgi?id=1498390 ), the main issue seems to be the following:

Each build failed had a set of glusterd process (around 300) that where
started by jenkins, to test regression for this change:
https://review.gluster.org/#/c/18271/ 

(found due to environment variable of the process)

But upon closer inspection of the patch, it doesn't seems buggy, so my
suspicion are on the test case, who is also quite simple (and likely
bug free), but who also seems to start a ton of of volume (around 1000
) if I am not wrong, and this do seems to result into a large number of
process being created.

See https://review.gluster.org/#/c/18271/5/tests/bugs/cli/bug-1490853.t

Could it be that the test case do uncover a bug in the test suite, or a
bug in gluster ?

Looking at the test suite, I see that the cleanup function is
conveniently ignoring a ton of error:
https://github.com/gluster/glusterfs/blob/master/tests/include.rc#L465
which do not help to see what is going wrong.

I also do not see out of memory errors, but munin graph seems to stop
right at the same time, so maybe that's just a ressource issue.

So my questions are:
- is gluster supposed to scale gracefully with 1000 volumes on 1 single
 node ?
- how much ressources should we plan for that ? (right now, we have 2G
VM, we can't increase much without reinstalling the whole set of
servers)

If you see any builders not working, please ping me on irc.

-- 
Michael Scherer
Sysadmin, Community Infrastructure and Platform, OSAS

Attachment:
signature.asc

Description: This is a digitally signed message part
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel