> Op 22 juli 2016 om 1:57 schreef Brad Hubbard <bhubbard@xxxxxxxxxx>: > > > On Thu, Jul 21, 2016 at 08:41:42PM +0200, Wido den Hollander wrote: > > > > > Op 21 juli 2016 om 20:01 schreef Gregory Farnum <gfarnum@xxxxxxxxxx>: > > > > > > > > > On Thu, Jul 21, 2016 at 8:49 AM, Wido den Hollander <wido@xxxxxxxx> wrote: > > > > Hi, > > > > > > > > On a CentOS 7 system with a Jewel 10.2.2 cluster I'm trying to create a pool which fails. > > > > > > > > Any pool I try to create, with or without a ruleset applied to it fails with this error: > > > > > > > > "Error ENOMEM: crushtool check failed with -12: failed run crushtool: fork failed: (12) Cannot allocate memory" > > > > > > > > At first I thought it was a package version mismatch, but it doesn't seem to be the case. > > > > > > > > There are other commands like 'radosgw-admin' I see fail with -12 error codes as well. > > > > > > > > Any ideas what might be going on here? The system has roughly 29GB of free memory, so that should be sufficient. > > > > > > ulimits? > > > > Good suggestion, didn't check that, but after looking at them I don't think they are: > > > > [root@srv-zmb16-21 ~]# ulimit -a > > core file size (blocks, -c) 0 > > data seg size (kbytes, -d) unlimited > > scheduling priority (-e) 0 > > file size (blocks, -f) unlimited > > pending signals (-i) 128505 > > max locked memory (kbytes, -l) 64 > > max memory size (kbytes, -m) unlimited > > open files (-n) 65536 > > pipe size (512 bytes, -p) 8 > > POSIX message queues (bytes, -q) 819200 > > real-time priority (-r) 0 > > stack size (kbytes, -s) 8192 > > cpu time (seconds, -t) unlimited > > max user processes (-u) 4096 > > virtual memory (kbytes, -v) unlimited > > file locks (-x) unlimited > > [root@srv-zmb16-21 ~]# > > > > Wouldn't you say? > > > As a test try doubling vm.max_map_count. We've seen the ENOMEM before in cases > where the number of memory allocations mapped by a process exceeded this value. > Note that if this is the issue it likely indicates somewhere in excess of > 32700 threads are being created so you may want to look at just how many > threads *are* being created when this issue is seen as well as taking a look > at the /proc/<PID>/maps file for the process to verify the number of > allocations. If you are seeing > 32700 threads created we should look at > whether that number makes sense in your environment. > Thanks for the pointers. It wasn't THE solution, but you brought me to the right area to look. I ran the Ceph command with '--debug-ms=10' and this lead me to: 2016-07-22 10:57:25.387445 7f593657c700 1 -- [2a04:XX:XX:70d4:ec4:7aff:febc:15f0]:0/1657730817 <== mon.0 [2a04:XX:XX:70d4:ec4:7aff:febc:15f0]:6789/0 8 ==== mon_command_ack([{"prefix": "osd pool create", "pg_num": 512, "erasure_code_profile": "prd-zm1-hdd", "pool": ".zm1.rgw.meta", "pgp_num": 512}]=-12 crushtool check failed with -12: failed run crushtool: fork failed: (12) Cannot allocate memory v135559) v1 ==== 253+0+0 (3230086078 0 0) 0x7f59200042e0 con 0x7f593805f290 What I saw is that it was the *monitor* who was sending back the ENOMEM, not my local system. It was running into some limits. Keep in mind, this is a 1800 OSD cluster with 5 Monitors. So I upped a few limits: - vm.max_map_count=262144 - kernel.pid_max=4194303 Didn't work, so I looked at the systemd service definition: In ceph-mon@.service I updated these lines: LimitNOFILE=1048576 LimitNPROC=1048576 to: LimitNOFILE=-1 LimitNPROC=-1 After a systemd reload and restart of the monitors it all succeeded. Wido > HTH, > Brad > > > > > I also checked, SELinux is disabled. 'setenforce 0'. > > > > Wido > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html