Re: ulimit max user processes (-u) and non-root ceph clients

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sep 19, 2013, at 6:10 PM, Gregory Farnum <greg@xxxxxxxxxxx>
 wrote:

> On Wed, Sep 18, 2013 at 11:43 PM, Dan Van Der Ster
> <daniel.vanderster@xxxxxxx> wrote:
>> 
>> On Sep 18, 2013, at 11:50 PM, Gregory Farnum <greg@xxxxxxxxxxx>
>> wrote:
>> 
>>> On Wed, Sep 18, 2013 at 6:33 AM, Dan Van Der Ster
>>> <daniel.vanderster@xxxxxxx> wrote:
>>>> Hi,
>>>> We just finished debugging a problem with RBD-backed Glance image creation failures, and thought our workaround would be useful for others. Basically, we found that during an image upload, librbd on the glance api server was consuming many many processes, eventually hitting the 1024 nproc limit of non-root users in RHEL. The failure occurred when uploading to pools with 2048 PGs, but didn't fail when uploading to pools with 512 PGs (we're guessing that librbd is opening one thread per accessed-PG, and not closing those threads until the whole processes completes.)
>>>> 
>>>> If you hit this same problem (and you run RHEL like us), you'll need to modify at least /etc/security/limits.d/90-nproc.conf (adding your non-root user that should be allowed > 1024 procs), and then also possibly run ulimit -u in the init script of your client process. Ubuntu should have some similar limits.
>>> 
>>> Did your pools with 2048 PGs have a significantly larger number of
>>> OSDs in them? Or are both pools on a pool with a lot of OSDs relative
>>> to the PG counts?
>> 
>> 1056 OSDs at the moment.
>> 
>> Uploading a 14GB image we observed up to ~1500 threads.
>> 
>> We set the glance client to allow 4096 processes for now.
>> 
>> 
>>> The PG count shouldn't matter for this directly, but RBD (and other
>>> clients) will create a couple messenger threads for each OSD it talks
>>> to, and while they'll eventually shut down on idle it doesn't
>>> proactively close them. I'd expect this to be a problem around 500
>>> OSDs.
>> 
>> A couple, is that the upper limit? Should we be safe with ulimit -u 2*nOSDs +1 ??
> 
> The messenger currently generates 2 threads per daemon it communicates
> with (although they will go away after a long enough idle period).
> 2*nOSD+1 won't quite be enough as there's the monitor connection and a
> handful of internal threads (I don't remember the exact numbers
> off-hand).
> 
> So far this hasn't been a problem for anybody and I doubt you'll see
> issues, but at some point we will need to switch the messenger to use
> epoll instead of a thread per socket. :)

So we are the first ;)

Anyway, for now Glance is using ulimit -u 4096, and this is not a big problem since we are in close contact with the people running those tests. But in future when the user base of Ceph grows here at our lab, it would be more friendly if the Ceph clients would fit under the default RHEL/Ubuntu ulimits.

Cheers, Dan


> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux