Resolved. The cause of the 57th OSD not starting was an invalid FD error when creating the journal. This turned out to be a depletion of aio resources. Changing the aio-max-nr to: sudo sysctl -w fs.aio-max-nr=50000000 # Probably way too high but this did break through the fault After this, the 56th OSD started. Settings that existed during the problem. Everest-0-1 is the system with 57 OSDs and Everest-0-0 does not have any: [cephadmin@everest-0-1 ~]$ cat /proc/sys/fs/aio-max-nr 65536 [cephadmin@everest-0-0 ~]$ cat /proc/sys/fs/aio-max-nr 65536 [cephadmin@everest-0-1 ~]$ cat /proc/sys/fs/aio-nr 129024 [cephadmin@everest-0-0 ~]$ cat /proc/sys/fs/aio-nr 0 After the change and starting the 57th OSD; [cephadmin@everest-0-1 ~]$ sudo sysctl -a|grep aio fs.aio-max-nr = 50000000 fs.aio-nr = 131328 So, each OSD consumes about 2,304 aio resources. The other parts to the original solution are: 1) Use the backported ceph-disk for infernalis 2) This bug helped understand the problem http://tracker.ceph.com/issues/9073 3) Editing the /etc/sysconfig/ceph CEPH_AUTO_RESTART_ON_UPGRADE=no 4) using ceph-deploy prepare instead of prepare/activate or create. With the udev and systemd, the activation is handled by systemd. Changing nofiles higher did not have any effect. Walker -----Original Message----- From: Walker H Haddock Sent: Sunday, January 31, 2016 7:50 PM To: 'Walker H Haddock'; Ceph Development Subject: RE: failure to activate with Infernalis on CentOS 7.2.1511 Hello ceph developers. I have made some progress. Using the documentation for the latest release of ceph-deploy and using the ceph-deploy create command instead of the prepare/activate sequence as documented on master in the quick start, I was able to get passed the problem with the key that I had reported. However, the OSDs were not starting. After looking through the systemd scripts and reading the /etc/sysconfig/ceph setting for CEPH_AUTO_RESTART_ON_UPGRADE=no and changing the value to yes, I have been able to create OSDs. Now my problem is trying to create more than 56 OSDs on the same host. The 57th OSD does not start. I get the following in /var/log/messages: Jan 31 17:24:04 everest-0-1 flock: ceph-disk: Running command: /usr/sbin/ceph-disk activate /dev/sdbj1 Jan 31 17:24:04 everest-0-1 systemd: start request repeated too quickly for ceph-disk@-dev-sdbj1.service Jan 31 17:24:04 everest-0-1 systemd: Failed to start Ceph disk activation: /dev/sdbj1. Jan 31 17:24:04 everest-0-1 systemd: ceph-disk@-dev-sdbj1.service failed. Thanks -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Walker H Haddock Sent: Friday, January 29, 2016 3:14 PM To: Ceph Development Subject: failure to activate with Infernalis on CentOS 7.2.1511 I am getting the following authentication error when running ceph-deploy activate after running ceph-deploy prepare. I have installed the back ported version of ceph-disk: http://tracker.ceph.com/issues/14080 ceph.x86_64 1:9.2.0-0.el7 Linux everest-1-3 3.18.13 #1 SMP Sat May 16 17:14:47 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux ceph-deploy --overwrite-conf osd prepare --fs-type btrfs everest-0-0:/dev/sdf:/dev/sdc ceph-deploy osd activate everest-0-0:/dev/sdf1:/dev/sdc1 eph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 10.91599 root default -2 10.91599 host everest-0-0 0 5.45799 osd.0 down 0 1.00000 1 5.45799 osd.1 DNE 0 ceph-deploy activate output: [cephadmin@everest-1-3 ceph-everest]$ ceph-deploy osd activate everest-0-0:/dev/sdf1:/dev/sdc1 [ceph_deploy.conf][DEBUG ] found configuration file at: /home/cephadmin/.cephdeploy.conf [ceph_deploy.cli][INFO ] Invoked (1.5.31): /usr/bin/ceph-deploy osd activate everest-0-0:/dev/sdf1:/dev/sdc1 [ceph_deploy.cli][INFO ] ceph-deploy options: [ceph_deploy.cli][INFO ] username : None [ceph_deploy.cli][INFO ] verbose : False [ceph_deploy.cli][INFO ] overwrite_conf : False [ceph_deploy.cli][INFO ] subcommand : activate [ceph_deploy.cli][INFO ] quiet : False [ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x1140ea8> [ceph_deploy.cli][INFO ] cluster : ceph [ceph_deploy.cli][INFO ] func : <function osd at 0x1135320> [ceph_deploy.cli][INFO ] ceph_conf : None [ceph_deploy.cli][INFO ] default_release : False [ceph_deploy.cli][INFO ] disk : [('everest-0-0', '/dev/sdf1', '/dev/sdc1')] [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks everest-0-0:/dev/sdf1:/dev/sdc1 [everest-0-0][DEBUG ] connection detected need for sudo [everest-0-0][DEBUG ] connected to host: everest-0-0 [everest-0-0][DEBUG ] detect platform information from remote host [everest-0-0][DEBUG ] detect machine type [everest-0-0][DEBUG ] find the location of an executable [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.2.1511 Core [ceph_deploy.osd][DEBUG ] activating host everest-0-0 disk /dev/sdf1 [ceph_deploy.osd][DEBUG ] will use init type: systemd [everest-0-0][INFO ] Running command: sudo ceph-disk -v activate --mark-init systemd --mount /dev/sdf1 [everest-0-0][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf1 uuid path is /sys/dev/block/8:81/dm/uuid [everest-0-0][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf1 uuid path is /sys/dev/block/8:81/dm/uuid [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk -i 1 /dev/sdf [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /sbin/blkid -p -s TYPE -o value -- /dev/sdf1 [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_btrfs [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_btrfs [everest-0-0][WARNIN] DEBUG:ceph-disk:Mounting /dev/sdf1 on /var/lib/ceph/tmp/mnt.E7nJsK with options noatime,user_subvol_rm_allowed [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/mount -t btrfs -o noatime,user_subvol_rm_allowed -- /dev/sdf1 /var/lib/ceph/tmp/mnt.E7nJsK [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /sbin/restorecon /var/lib/ceph/tmp/mnt.E7nJsK [everest-0-0][WARNIN] DEBUG:ceph-disk:Cluster uuid is fdb615fe-cd40-4690-afd0-d63a2260da24 [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [everest-0-0][WARNIN] DEBUG:ceph-disk:Cluster name is ceph [everest-0-0][WARNIN] DEBUG:ceph-disk:OSD uuid is c5a11f43-7ac8-4f20-bc17-531c02e4773d [everest-0-0][WARNIN] DEBUG:ceph-disk:OSD id is 0 [everest-0-0][WARNIN] DEBUG:ceph-disk:Marking with init system systemd [everest-0-0][WARNIN] DEBUG:ceph-disk:Authorizing OSD key... [everest-0-0][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /var/lib/ceph/tmp/mnt.E7nJsK/keyring osd allow * mon allow profile osd [everest-0-0][WARNIN] Error EINVAL: entity osd.0 exists but key does not match [everest-0-0][WARNIN] ERROR:ceph-disk:Failed to activate
Attachment:
smime.p7s
Description: S/MIME cryptographic signature