On Mon, Dec 16, 2013 at 11:08 AM, Dan van der Ster <daniel.vanderster@xxxxxxx> wrote: > Hi, > > Sorry to revive this old thread, but I wanted to update you on the current > pains we're going through related to clients' nproc (and now nofile) > ulimits. When I started this thread we were using RBD for Glance images > only, but now we're trying to enable RBD-backed Cinder volumes and are not > really succeeding at the moment :( > > As we had guessed from our earlier experience, librbd and therefore qemu-kvm > need increased nproc/nofile limits otherwise VMs will freeze. In fact we > just observed a lockup of a test VM due to the RBD device blocking > completely (this appears as blocked flush processes in the VM); we're > actually not sure which of the nproc/nofile limits caused the freeze, but it > was surely one of those. > > And the main problem we face now is that it isn't trivial to increase the > limits of qemu-kvm on a running OpenStack hypervisor -- the values are set > by libvirtd and seem to require a restart of all guest VMs on a host to > reload a qemu config file. I'll update this thread when we find the solution > to that... Is there some reason you can't just set it ridiculously high to start with? > Moving forward, IMHO it would be much better if Ceph clients could > gracefully work with large clusters without _requiring_ changes to the > ulimits. I understand that such poorly configured clients would necessarily > have decreased performance (since librados would need to use a thread pool > and also lose some of the persistent client-OSD connections). But client > lockups are IMHO worse that slightly lower performance. > > Have you guys discussed the client ulimit issues recently and is there a > plan in the works? I'm afraid not. It's a plannable but non-trivial amount of work and the Inktank dev team is pretty well booked for a while. Anybody running into this as a serious bottleneck should 1) try and start a community effort 2) try and promote it as a priority with any Inktank business contacts they have. (You are only the second group to report it as an ongoing concern rather than a one-off hiccup, and honestly it sounds like you're just having issues with hitting the arbitrary limits, not with real resource exhaustion issues.) :) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com