librados pthread_create failure

Greg Poirier <greg.poirier@xxxxxxxxxx> · Mon, 26 Aug 2013 09:24:02 -0700

So, in doing some testing last week, I believe I managed to exhaust the number of threads available to nova-compute last week. After some investigation, I found the pthread_create failure and increased nproc for our Nova user to, what I considered, a ridiculous 120,000 threads after reading that librados will require a thread per osd, plus a few for overhead, per VM on our compute nodes.

This made me wonder: how many threads could Ceph possibly need on one of our compute nodes.

32 cores * an overcommit ratio of 16, assuming each one is booted from a Ceph volume, * 300 (approximate number of disks in our soon-to-go-live Ceph cluster) = 153,600 threads.

So this is where I started to put the truck in reverse. Am I right? What about when we triple the size of our Ceph cluster? I could easily see a future where we have easily 1,000 disks, if not many, many more in our cluster. How do people scale this? Do you RAID to increase the density of your Ceph cluster? I can only imagine that this will also drastically increase the amount of resources required on my data nodes as well.

So... suggestions? Reading?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com