Re: Pool creation fails: 'failed run crushtool: fork failed: (12) Cannot allocate memory'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 25, 2016 at 09:47:10AM +0200, Wido den Hollander wrote:
> 
> > Op 22 juli 2016 om 12:46 schreef Brad Hubbard <bhubbard@xxxxxxxxxx>:
> > 
> > 
> > On Fri, Jul 22, 2016 at 11:41:26AM +0200, Wido den Hollander wrote:
> > > 
> > > > Op 22 juli 2016 om 1:57 schreef Brad Hubbard <bhubbard@xxxxxxxxxx>:
> > > > 
> > > > 
> > > > On Thu, Jul 21, 2016 at 08:41:42PM +0200, Wido den Hollander wrote:
> > > > > 
> > > > > > Op 21 juli 2016 om 20:01 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>:
> > > > > > 
> > > > > > 
> > > > > > On Thu, Jul 21, 2016 at 8:49 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > On a CentOS 7 system with a Jewel 10.2.2 cluster I'm trying to create a pool which fails.
> > > > > > >
> > > > > > > Any pool I try to create, with or without a ruleset applied to it fails with this error:
> > > > > > >
> > > > > > > "Error ENOMEM: crushtool check failed with -12: failed run crushtool: fork failed: (12) Cannot allocate memory"
> > > > > > >
> > > > > > > At first I thought it was a package version mismatch, but it doesn't seem to be the case.
> > > > > > >
> > > > > > > There are other commands like 'radosgw-admin' I see fail with -12 error codes as well.
> > > > > > >
> > > > > > > Any ideas what might be going on here? The system has roughly 29GB of free memory, so that should be sufficient.
> > > > > > 
> > > > > > ulimits?
> > > > > 
> > > > > Good suggestion, didn't check that, but after looking at them I don't think they are:
> > > > > 
> > > > > [root@srv-zmb16-21 ~]# ulimit -a
> > > > > core file size          (blocks, -c) 0
> > > > > data seg size           (kbytes, -d) unlimited
> > > > > scheduling priority             (-e) 0
> > > > > file size               (blocks, -f) unlimited
> > > > > pending signals                 (-i) 128505
> > > > > max locked memory       (kbytes, -l) 64
> > > > > max memory size         (kbytes, -m) unlimited
> > > > > open files                      (-n) 65536
> > > > > pipe size            (512 bytes, -p) 8
> > > > > POSIX message queues     (bytes, -q) 819200
> > > > > real-time priority              (-r) 0
> > > > > stack size              (kbytes, -s) 8192
> > > > > cpu time               (seconds, -t) unlimited
> > > > > max user processes              (-u) 4096
> > > > > virtual memory          (kbytes, -v) unlimited
> > > > > file locks                      (-x) unlimited
> > > > > [root@srv-zmb16-21 ~]#
> > > > > 
> > > > > Wouldn't you say?
> > > > 
> > > > 
> > > > As a test try doubling vm.max_map_count. We've seen the ENOMEM before in cases
> > > > where the number of memory allocations mapped by a process exceeded this value.
> > > > Note that if this is the issue it likely indicates somewhere in excess of
> > > > 32700 threads are being created so you may want to look at just how many
> > > > threads *are* being created when this issue is seen as well as taking a look
> > > > at the /proc/<PID>/maps file for the process to verify the number of
> > > > allocations. If you are seeing > 32700 threads created we should look at
> > > > whether that number makes sense in your environment.
> > > > 
> > > 
> > > Thanks for the pointers. It wasn't THE solution, but you brought me to the right area to look.
> > > 
> > > I ran the Ceph command with '--debug-ms=10' and this lead me to:
> > > 
> > > 2016-07-22 10:57:25.387445 7f593657c700  1 -- [2a04:XX:XX:70d4:ec4:7aff:febc:15f0]:0/1657730817 <== mon.0 [2a04:XX:XX:70d4:ec4:7aff:febc:15f0]:6789/0 8 ==== mon_command_ack([{"prefix": "osd pool create", "pg_num": 512, "erasure_code_profile": "prd-zm1-hdd", "pool": ".zm1.rgw.meta", "pgp_num": 512}]=-12 crushtool check failed with -12: failed run crushtool: fork failed: (12) Cannot allocate memory v135559) v1 ==== 253+0+0 (3230086078 0 0) 0x7f59200042e0 con 0x7f593805f290
> > > 
> > > What I saw is that it was the *monitor* who was sending back the ENOMEM, not my local system.
> > > 
> > > It was running into some limits. Keep in mind, this is a 1800 OSD cluster with 5 Monitors.
> > > 
> > > So I upped a few limits:
> > > 
> > > - vm.max_map_count=262144
> > > - kernel.pid_max=4194303
> > > 
> > > Didn't work, so I looked at the systemd service definition:
> > > 
> > > In ceph-mon@.service I updated these lines:
> > > 
> > > LimitNOFILE=1048576
> > > LimitNPROC=1048576
> > > 
> > > to:
> > > 
> > > LimitNOFILE=-1
> > > LimitNPROC=-1
> > > 
> > > After a systemd reload and restart of the monitors it all succeeded.
> > > 
> > 
> > Nicely done.
> > 
> > Seems excessive it would be using over one million file descriptors or the
> > same amount of processes though?
> > 
> 
> Yes, I find it excessive as well. But this solved the issue at hand for the moment.
> 
> > Maybe we should gather lsof and ps (probably need to use one of the options
> > under "THREAD DISPLAY" in the man page to capture thread IDs as well as PIDs)
> > and try and work out why so many of these resources are being allocated?
> > 
> 
> I took a quick look at the primary monitor:
> 
> [root@srv-zmb03-05 ceph]# lsof -np 24517|wc -l
> 832
> [root@srv-zmb03-05 ceph]#
> 
> [root@srv-zmb03-05 ceph]# cat /proc/24517/limits  
> Limit                     Soft Limit           Hard Limit           Units     
> Max cpu time              unlimited            unlimited            seconds   
> Max file size             unlimited            unlimited            bytes     
> Max data size             unlimited            unlimited            bytes     
> Max stack size            8388608              unlimited            bytes     
> Max core file size        0                    unlimited            bytes     
> Max resident set          unlimited            unlimited            bytes     
> Max processes             unlimited            unlimited            processes 
> Max open files            65536                65536                files     
> Max locked memory         65536                65536                bytes     
> Max address space         unlimited            unlimited            bytes     
> Max file locks            unlimited            unlimited            locks     
> Max pending signals       128472               128472               signals   
> Max msgqueue size         819200               819200               bytes     
> Max nice priority         0                    0                    
> Max realtime priority     0                    0                    
> Max realtime timeout      unlimited            unlimited            us        
> [root@srv-zmb03-05 ceph]#
> 
> So it seems this monitor ran into LimitNPROC since the open files are so low that it probably didn't run into that limit.

I suspect we are missing something here. It seems unlikely to me that you are
hitting such a high process limit and, if you were, I would suggest the error
returned would be something other than ENOMEM. I admit that the evidence
so far appears to indicate that this limit is somehow involved but I'm
suspicious that it is not the entire story here and there is more to be
learned.

> 
> Wido
> 
> > HTH,
> > Brad
> > 
> > > Wido
> > > 
> > > > HTH,
> > > > Brad
> > > > 
> > > > > 
> > > > > I also checked, SELinux is disabled. 'setenforce 0'.
> > > > > 
> > > > > Wido
> > > > > 
> > > > > > --
> > > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > > --
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > Cheers,
> > Brad

-- 
Cheers,
Brad
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux