Re: ulimit max user processes (-u) and non-root ceph clients

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 16 Dec 2013 11:25:59 -0800

On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster
<daniel.vanderster@xxxxxxx> wrote:
> Hi,
>
> Sorry to revive this old thread, but I wanted to update you on the current
> pains we're going through related to clients' nproc (and now nofile)
> ulimits. When I started this thread we were using RBD for Glance images
> only, but now we're trying to enable RBD-backed Cinder volumes and are not
> really succeeding at the moment :(
>
> As we had guessed from our earlier experience, librbd and therefore qemu-kvm
> need increased nproc/nofile limits otherwise VMs will freeze. In fact we
> just observed a lockup of a test VM due to the RBD device blocking
> completely (this appears as blocked flush processes in the VM); we're
> actually not sure which of the nproc/nofile limits caused the freeze, but it
> was surely one of those.
>
> And the main problem we face now is that it isn't trivial to increase the
> limits of qemu-kvm on a running OpenStack hypervisor -- the values are set
> by libvirtd and seem to require a restart of all guest VMs on a host to
> reload a qemu config file. I'll update this thread when we find the solution
> to that...

Is there some reason you can't just set it ridiculously high to start with?

> Moving forward, IMHO it would be much better if Ceph clients could
> gracefully work with large clusters without _requiring_ changes to the
> ulimits. I understand that such poorly configured clients would necessarily
> have decreased performance (since librados would need to use a thread pool
> and also lose some of the persistent client-OSD connections). But client
> lockups are IMHO worse that slightly lower performance.
>
> Have you guys discussed the client ulimit issues recently and is there a
> plan in the works?

I'm afraid not. It's a plannable but non-trivial amount of work and
the Inktank dev team is pretty well booked for a while. Anybody
running into this as a serious bottleneck should
1) try and start a community effort
2) try and promote it as a priority with any Inktank business contacts
they have.
(You are only the second group to report it as an ongoing concern
rather than a one-off hiccup, and honestly it sounds like you're just
having issues with hitting the arbitrary limits, not with real
resource exhaustion issues.)
:)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com