Re: ulimit max user processes (-u) and non-root ceph clients

Dan van der Ster <daniel.vanderster@xxxxxxx> · Mon, 16 Dec 2013 20:08:51 +0100

Hi,
Sorry to revive this old thread, but I wanted to update you on the current pains we're going through related to clients' nproc (and now nofile) ulimits. When I started this thread we were using RBD for Glance images only, but now we're trying to enable RBD-backed Cinder volumes and are not really succeeding at the moment :(

As we had guessed from our earlier experience, librbd and therefore qemu-kvm need increased nproc/nofile limits otherwise VMs will freeze. In fact we just observed a lockup of a test VM due to the RBD device blocking completely (this appears as blocked flush processes in the VM); we're actually not sure which of the nproc/nofile limits caused the freeze, but it was surely one of those. 

And the main problem we face now is that it isn't trivial to increase the limits of qemu-kvm on a running OpenStack hypervisor -- the values are set by libvirtd and seem to require a restart of all guest VMs on a host to reload a qemu config file. I'll update this thread when we find the solution to that...

Moving forward, IMHO it would be much better if Ceph clients could gracefully work with large clusters without _requiring_ changes to the ulimits. I understand that such poorly configured clients would necessarily have decreased performance (since librados would need to use a thread pool and also lose some of the persistent client-OSD connections). But client lockups are IMHO worse that slightly lower performance.

Have you guys discussed the client ulimit issues recently and is there a plan in the works?
Best Regards,

Dan, CERN IT/DSS

On Sep 19, 2013 6:10 PM, "Gregory Farnum" <greg@xxxxxxxxxxx> wrote:

On Wed, Sep 18, 2013 at 11:43 PM, Dan Van Der Ster

<daniel.vanderster@xxxxxxx> wrote:

>

> On Sep 18, 2013, at 11:50 PM, Gregory Farnum <greg@xxxxxxxxxxx>

>  wrote:

>

>> On Wed, Sep 18, 2013 at 6:33 AM, Dan Van Der Ster

>> <daniel.vanderster@xxxxxxx> wrote:

>>> Hi,

>>> We just finished debugging a problem with RBD-backed Glance image creation failures, and thought our workaround would be useful for others. Basically, we found that during an image upload, librbd on the glance api server was consuming many many processes, eventually hitting the 1024 nproc limit of non-root users in RHEL. The failure occurred when uploading to pools with 2048 PGs, but didn't fail when uploading to pools with 512 PGs (we're guessing that librbd is opening one thread per accessed-PG, and not closing those threads until the whole processes completes.)

>>>

>>> If you hit this same problem (and you run RHEL like us), you'll need to modify at least /etc/security/limits.d/90-nproc.conf (adding your non-root user that should be allowed > 1024 procs), and then also possibly run ulimit -u in the init script of your client process. Ubuntu should have some similar limits.

>>

>> Did your pools with 2048 PGs have a significantly larger number of

>> OSDs in them? Or are both pools on a pool with a lot of OSDs relative

>> to the PG counts?

>

> 1056 OSDs at the moment.

>

> Uploading a 14GB image we observed up to ~1500 threads.

>

> We set the glance client to allow 4096 processes for now.

>

>

>> The PG count shouldn't matter for this directly, but RBD (and other

>> clients) will create a couple messenger threads for each OSD it talks

>> to, and while they'll eventually shut down on idle it doesn't

>> proactively close them. I'd expect this to be a problem around 500

>> OSDs.

>

> A couple, is that the upper limit? Should we be safe with ulimit -u 2*nOSDs +1 ??

The messenger currently generates 2 threads per daemon it communicates

with (although they will go away after a long enough idle period).

2*nOSD+1 won't quite be enough as there's the monitor connection and a

handful of internal threads (I don't remember the exact numbers

off-hand).

So far this hasn't been a problem for anybody and I doubt you'll see

issues, but at some point we will need to switch the messenger to use

epoll instead of a thread per socket. :)

-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com