Re: ulimit max user processes (-u) and non-root ceph clients

Dan Van Der Ster <daniel.vanderster@xxxxxxx> · Mon, 16 Dec 2013 19:36:18 +0000

On Dec 16, 2013 8:26 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>
> On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster
> <daniel.vanderster@xxxxxxx> wrote:
> > Hi,
> >
> > Sorry to revive this old thread, but I wanted to update you on the current
> > pains we're going through related to clients' nproc (and now nofile)
> > ulimits. When I started this thread we were using RBD for Glance images
> > only, but now we're trying to enable RBD-backed Cinder volumes and are not
> > really succeeding at the moment :(
> >
> > As we had guessed from our earlier experience, librbd and therefore qemu-kvm
> > need increased nproc/nofile limits otherwise VMs will freeze. In fact we
> > just observed a lockup of a test VM due to the RBD device blocking
> > completely (this appears as blocked flush processes in the VM); we're
> > actually not sure which of the nproc/nofile limits caused the freeze, but it
> > was surely one of those.
> >
> > And the main problem we face now is that it isn't trivial to increase the
> > limits of qemu-kvm on a running OpenStack hypervisor -- the values are set
> > by libvirtd and seem to require a restart of all guest VMs on a host to
> > reload a qemu config file. I'll update this thread when we find the solution
> > to that...
>
> Is there some reason you can't just set it ridiculously high to start with?
>

As I mentioned, we haven't yet found a way to change the limits without affecting (stopping) the existing running (important) VMs. We thought that /etc/security/limits.conf would do the trick, but alas limits there have no effect on qemu.

Cheers, Dan

> > Moving forward, IMHO it would be much better if Ceph clients could
> > gracefully work with large clusters without _requiring_ changes to the
> > ulimits. I understand that such poorly configured clients would necessarily
> > have decreased performance (since librados would need to use a thread pool
> > and also lose some of the persistent client-OSD connections). But client
> > lockups are IMHO worse that slightly lower performance.
> >
> > Have you guys discussed the client ulimit issues recently and is there a
> > plan in the works?
>
> I'm afraid not. It's a plannable but non-trivial amount of work and
> the Inktank dev team is pretty well booked for a while. Anybody
> running into this as a serious bottleneck should
> 1) try and start a community effort
> 2) try and promote it as a priority with any Inktank business contacts
> they have.
> (You are only the second group to report it as an ongoing concern
> rather than a one-off hiccup, and honestly it sounds like you're just
> having issues with hitting the arbitrary limits, not with real
> resource exhaustion issues.)
> :)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com