Thanks for the all the help. For some bizarre reason I had an empty host inside default root. Once I dumped a "fake" osd into it everything started working. On Mon, Aug 20, 2018 at 7:36 PM Daznis <daznis@xxxxxxxxx> wrote: > > Hello, > > > Medic shows everything fine. Whole cluster is on the latest mimic > version. It was updated to mimic when stable version of mimic was > release and recently it was updated to "ceph version 13.2.1 > (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)". For some > reason one mgr service is running, but it's not connected to the > cluster. > > Versions output: > > { > "mon": { > "ceph version 13.2.1 > (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 3 > }, > "mgr": { > "ceph version 13.2.1 > (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 2 > }, > "osd": { > "ceph version 13.2.1 > (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 47 > }, > "mds": {}, > "overall": { > "ceph version 13.2.1 > (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 52 > } > } > > Medic output: > ======================= Starting remote check session ======================== > Version: 1.0.4 Cluster Name: "ceph" > Total hosts: [10] > OSDs: 5 MONs: 3 Clients: 0 > MDSs: 0 RGWs: 0 MGRs: 2 > > ================================================================================ > > ---------- managers ---------- > mon03 > mon02 > mon01 > > ------------ osds ------------ > node03 > node02 > node01 > node05 > node04 > > ------------ mons ------------ > mon01 > mon03 > mon02 > > 107 passed, on 11 hosts > On Mon, Aug 20, 2018 at 6:13 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote: > > > > On Mon, Aug 20, 2018 at 10:23 AM, Daznis <daznis@xxxxxxxxx> wrote: > > > Hello, > > > > > > It appears that something is horribly wrong with the cluster itself. I > > > can't create or add any new osds to it at all. > > > > Have you added new monitors? Or replaced monitors? I would check that > > all your versions match, something seems to be expecting different > > versions. > > > > The "Invalid argument" problem is a common thing we see when that happens. > > > > Something that might help a bit here is if you run ceph-medic against > > your cluster: > > > > http://docs.ceph.com/ceph-medic/master/ > > > > > > > > > On Mon, Aug 20, 2018 at 11:04 AM Daznis <daznis@xxxxxxxxx> wrote: > > >> > > >> Hello, > > >> > > >> > > >> Zapping the journal didn't help. I tried to create the journal after > > >> zapping it. Also failed. I'm not really sure why this happens. > > >> > > >> Looking at the monitor logs with 20/20 debug I'm seeing these errors: > > >> > > >> 2018-08-20 08:57:58.753 7f9d85934700 0 mon.mon02@1(peon) e4 > > >> handle_command mon_command({"prefix": "osd crush set-device-class", > > >> "class": "ssd", "ids": ["48"]} v 0) v1 > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=osd > > >> command=osd crush set-device-class read write on cap allow profile osd > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 allow so far , doing grant > > >> allow profile osd > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 match > > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> _allowed_command capable > > >> 2018-08-20 08:57:58.753 7f9d85934700 0 log_channel(audit) log [INF] : > > >> from='osd.48 10.24.52.17:6800/153683' entity='osd.48' cmd=[{"prefix": > > >> "osd crush set-device-class", "class": "ssd", "ids": ["48"]}]: > > >> dispatch > > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).osd e46327 > > >> preprocess_query mon_command({"prefix": "osd crush set-device-class", > > >> "class": "ssd", "ids": ["48"]} v 0) v1 from osd.48 > > >> 10.24.52.17:6800/153683 > > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> forward_request 4 request mon_command({"prefix": "osd crush > > >> set-device-class", "class": "ssd", "ids": ["48"]} v 0) v1 features > > >> 4611087854031142907 > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4 > > >> _ms_dispatch existing session 0x55b4ec482a80 for mon.1 > > >> 10.24.52.11:6789/0 > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4 caps allow * > > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log > > >> v10758065 preprocess_query log(1 entries from seq 4 at 2018-08-20 > > >> 08:57:58.755306) v1 from mon.1 10.24.52.11:6789/0 > > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log > > >> v10758065 preprocess_log log(1 entries from seq 4 at 2018-08-20 > > >> 08:57:58.755306) v1 from mon.1 > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=log > > >> command= write on cap allow * > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 allow so far , doing grant allow * > > >> 2018-08-20 08:57:58.753 7f9d85934700 20 allow all > > >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> forward_request 5 request log(1 entries from seq 4 at 2018-08-20 > > >> 08:57:58.755306) v1 features 4611087854031142907 > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 mon.mon02@1(peon) e4 > > >> _ms_dispatch existing session 0x55b4ec4828c0 for mon.0 > > >> 10.24.52.10:6789/0 > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 mon.mon02@1(peon) e4 caps allow * > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 is_capable service=mon > > >> command= read on cap allow * > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 allow so far , doing grant allow * > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 allow all > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 is_capable service=mon > > >> command= exec on cap allow * > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 allow so far , doing grant allow * > > >> 2018-08-20 08:57:58.754 7f9d85934700 20 allow all > > >> 2018-08-20 08:57:58.754 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> handle_route mon_command_ack([{"prefix": "osd crush set-device-class", > > >> "class": "ssd", "ids": ["48"]}]=-22 (22) Invalid argument v46327) v1 > > >> to unknown.0 - > > >> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> ms_handle_reset 0x55b4ecf4b200 10.24.52.17:6800/153683 > > >> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> reset/close on session osd.48 10.24.52.17:6800/153683 > > >> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4 > > >> remove_session 0x55b4ecf86380 osd.48 10.24.52.17:6800/153683 features > > >> 0x3ffddff8ffa4fffb > > >> 2018-08-20 08:57:58.828 7f9d85934700 20 mon.mon02@1(peon) e4 > > >> _ms_dispatch existing session 0x55b4ec4828c0 for mon.0 > > >> 10.24.52.10:6789/0 > > >> On Sat, Aug 18, 2018 at 7:54 PM Daznis <daznis@xxxxxxxxx> wrote: > > >> > > > >> > Hello, > > >> > > > >> > not sure about it. I assumed ceph-deploy would do it with the > > >> > "--zap-disk" flag defined. I will try it on Monday and report the > > >> > progress. > > >> > On Sat, Aug 18, 2018 at 3:02 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote: > > >> > > > > >> > > On Fri, Aug 17, 2018 at 7:05 PM, Daznis <daznis@xxxxxxxxx> wrote: > > >> > > > Hello, > > >> > > > > > >> > > > > > >> > > > I have replace one of our failed OSD drives and recreated a new osd > > >> > > > with ceph-deploy and it failes to start. > > >> > > > > >> > > Is it possible you haven't zapped the journal on nvme0n1p13 ? > > >> > > > > >> > > > > >> > > > > >> > > > > > >> > > > Command: ceph-deploy --overwrite-conf osd create --filestore > > >> > > > --zap-disk --data /dev/bcache0 --journal /dev/nvme0n1p13 <Hostname> > > >> > > > > > >> > > > Output off ceph-deploy: > > >> > > > [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf > > >> > > > [ceph_deploy.cli][INFO ] Invoked (2.0.1): /usr/bin/ceph-deploy > > >> > > > --overwrite-conf osd create --filestore --zap-disk --data /dev/bcache0 > > >> > > > --journal /dev/nvme0n1p13 <Hostname> > > >> > > > [ceph_deploy.cli][INFO ] ceph-deploy options: > > >> > > > [ceph_deploy.cli][INFO ] verbose : False > > >> > > > [ceph_deploy.cli][INFO ] bluestore : None > > >> > > > [ceph_deploy.cli][INFO ] cd_conf : > > >> > > > <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f8622160bd8> > > >> > > > [ceph_deploy.cli][INFO ] cluster : ceph > > >> > > > [ceph_deploy.cli][INFO ] fs_type : xfs > > >> > > > [ceph_deploy.cli][INFO ] block_wal : None > > >> > > > [ceph_deploy.cli][INFO ] default_release : False > > >> > > > [ceph_deploy.cli][INFO ] username : None > > >> > > > [ceph_deploy.cli][INFO ] journal : /dev/nvme0n1p13 > > >> > > > [ceph_deploy.cli][INFO ] subcommand : create > > >> > > > [ceph_deploy.cli][INFO ] host : <Hostname> > > >> > > > [ceph_deploy.cli][INFO ] filestore : True > > >> > > > [ceph_deploy.cli][INFO ] func : <function > > >> > > > osd at 0x7f8622194848> > > >> > > > [ceph_deploy.cli][INFO ] ceph_conf : None > > >> > > > [ceph_deploy.cli][INFO ] zap_disk : True > > >> > > > [ceph_deploy.cli][INFO ] data : /dev/bcache0 > > >> > > > [ceph_deploy.cli][INFO ] block_db : None > > >> > > > [ceph_deploy.cli][INFO ] dmcrypt : False > > >> > > > [ceph_deploy.cli][INFO ] overwrite_conf : True > > >> > > > [ceph_deploy.cli][INFO ] dmcrypt_key_dir : > > >> > > > /etc/ceph/dmcrypt-keys > > >> > > > [ceph_deploy.cli][INFO ] quiet : False > > >> > > > [ceph_deploy.cli][INFO ] debug : False > > >> > > > [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data > > >> > > > device /dev/bcache0 > > >> > > > [<Hostname>][DEBUG ] connected to host: <Hostname> > > >> > > > [<Hostname>][DEBUG ] detect platform information from remote host > > >> > > > [<Hostname>][DEBUG ] detect machine type > > >> > > > [<Hostname>][DEBUG ] find the location of an executable > > >> > > > [ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.5.1804 Core > > >> > > > [ceph_deploy.osd][DEBUG ] Deploying osd to <Hostname> > > >> > > > [<Hostname>][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf > > >> > > > [<Hostname>][DEBUG ] find the location of an executable > > >> > > > [ceph_deploy.osd][WARNIN] zapping is no longer supported when preparing > > >> > > > [<Hostname>][INFO ] Running command: /usr/sbin/ceph-volume --cluster > > >> > > > ceph lvm create --filestore --data /dev/bcache0 --journal > > >> > > > /dev/nvme0n1p13 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph --cluster ceph --name > > >> > > > client.bootstrap-osd --keyring > > >> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new > > >> > > > a503ae5e-b5b9-40d7-b8b3-194f15e52082 > > >> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/vgcreate --force --yes > > >> > > > ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162 /dev/bcache0 > > >> > > > [<Hostname>][DEBUG ] stdout: Physical volume "/dev/bcache0" > > >> > > > successfully created. > > >> > > > [<Hostname>][DEBUG ] stdout: Volume group > > >> > > > "ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162" successfully created > > >> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/lvcreate --yes -l > > >> > > > 100%FREE -n osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082 > > >> > > > ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162 > > >> > > > [<Hostname>][DEBUG ] stdout: Logical volume > > >> > > > "osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082" created. > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key > > >> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/mkfs -t xfs -f -i > > >> > > > size=2048 /dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082 > > >> > > > [<Hostname>][DEBUG ] stdout: > > >> > > > meta-data=/dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082 > > >> > > > isize=2048 agcount=4, agsize=244154112 blks > > >> > > > [<Hostname>][DEBUG ] = sectsz=512 > > >> > > > attr=2, projid32bit=1 > > >> > > > [<Hostname>][DEBUG ] = crc=1 > > >> > > > finobt=0, sparse=0 > > >> > > > [<Hostname>][DEBUG ] data = bsize=4096 > > >> > > > blocks=976616448, imaxpct=5 > > >> > > > [<Hostname>][DEBUG ] = sunit=0 swidth=0 blks > > >> > > > [<Hostname>][DEBUG ] naming =version 2 bsize=4096 > > >> > > > ascii-ci=0 ftype=1 > > >> > > > [<Hostname>][DEBUG ] log =internal log bsize=4096 > > >> > > > blocks=476863, version=2 > > >> > > > [<Hostname>][DEBUG ] = sectsz=512 > > >> > > > sunit=0 blks, lazy-count=1 > > >> > > > [<Hostname>][DEBUG ] realtime =none extsz=4096 > > >> > > > blocks=0, rtextents=0 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/mount -t xfs -o > > >> > > > rw,noatime,inode64,noquota,nodiratime,logbufs=8,logbsize=256k,attr2 > > >> > > > /dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082 > > >> > > > /var/lib/ceph/osd/ceph-48 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ln -s /dev/nvme0n1p13 > > >> > > > /var/lib/ceph/osd/ceph-48/journal > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph --cluster ceph --name > > >> > > > client.bootstrap-osd --keyring > > >> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o > > >> > > > /var/lib/ceph/osd/ceph-48/activate.monmap > > >> > > > [<Hostname>][DEBUG ] stderr: got monmap epoch 4 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph > > >> > > > /var/lib/ceph/osd/ceph-48/ > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-osd --cluster ceph > > >> > > > --osd-objectstore filestore --mkfs -i 48 --monmap > > >> > > > /var/lib/ceph/osd/ceph-48/activate.monmap --keyfile - --osd-data > > >> > > > /var/lib/ceph/osd/ceph-48/ --osd-journal > > >> > > > /var/lib/ceph/osd/ceph-48/journal --osd-uuid > > >> > > > a503ae5e-b5b9-40d7-b8b3-194f15e52082 --setuser ceph --setgroup ceph > > >> > > > [<Hostname>][DEBUG ] stderr: 2018-08-17 18:23:26.067 7f1f62c471c0 -1 > > >> > > > auth: unable to find a keyring on /var/lib/ceph/osd/ceph-48//keyring: > > >> > > > (2) No such file or directory > > >> > > > [<Hostname>][DEBUG ] stderr: 2018-08-17 18:23:26.188 7f1f62c471c0 -1 > > >> > > > journal read_header error decoding journal header > > >> > > > [<Hostname>][DEBUG ] stderr: 2018-08-17 18:23:26.198 7f1f62c471c0 -1 > > >> > > > journal do_read_entry(4096): bad header magic > > >> > > > [<Hostname>][DEBUG ] stderr: 2018-08-17 18:23:26.198 7f1f62c471c0 -1 > > >> > > > journal do_read_entry(4096): bad header magic > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool > > >> > > > /var/lib/ceph/osd/ceph-48/keyring --create-keyring --name osd.48 > > >> > > > --add-key AQB39nZbiXJBMBAAOb9cxepxJrflhSNADuVNSQ== > > >> > > > [<Hostname>][DEBUG ] stdout: creating /var/lib/ceph/osd/ceph-48/keyring > > >> > > > [<Hostname>][DEBUG ] added entity osd.48 auth auth(auid = > > >> > > > 18446744073709551615 key=AQB39nZbiXJBMBAAOb9cxepxJrflhSNADuVNSQ== with > > >> > > > 0 caps) > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph > > >> > > > /var/lib/ceph/osd/ceph-48/keyring > > >> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/bcache0 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/ln -snf /dev/nvme0n1p13 > > >> > > > /var/lib/ceph/osd/ceph-48/journal > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13 > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/systemctl enable > > >> > > > ceph-volume@lvm-48-a503ae5e-b5b9-40d7-b8b3-194f15e52082 > > >> > > > [<Hostname>][DEBUG ] stderr: Created symlink from > > >> > > > /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-48-a503ae5e-b5b9-40d7-b8b3-194f15e52082.service > > >> > > > to /usr/lib/systemd/system/ceph-volume@.service. > > >> > > > [<Hostname>][DEBUG ] Running command: /bin/systemctl start ceph-osd@48 > > >> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 48 > > >> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm create successful for: /dev/bcache0 > > >> > > > [<Hostname>][INFO ] checking OSD status... > > >> > > > [<Hostname>][DEBUG ] find the location of an executable > > >> > > > [<Hostname>][INFO ] Running command: /bin/ceph --cluster=ceph osd > > >> > > > stat --format=json > > >> > > > > > >> > > > > > >> > > > When OSD service is starting I'm getting these errors: > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 _get_class not permitted to load lua > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 1 osd.48 0 warning: got an error > > >> > > > loading one or more classes: (1) Operation not permitted > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 osd.48 0 crush map has > > >> > > > features 288232575208783872, adjusting msgr requires for clients > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 osd.48 0 crush map has > > >> > > > features 288232575208783872 was 8705, adjusting msgr requires for mons > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 osd.48 0 crush map has > > >> > > > features 288232575208783872, adjusting msgr requires for osds > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 osd.48 0 load_pgs > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 osd.48 0 load_pgs opened 0 pgs > > >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0 0 osd.48 0 using > > >> > > > weightedpriority op queue with priority op cut off at 64. > > >> > > > 2018-08-17 19:12:02.999 7fd06e5a91c0 -1 osd.48 0 log_to_monitors {default=true} > > >> > > > 2018-08-17 19:12:03.004 7fd06e5a91c0 -1 osd.48 0 > > >> > > > mon_cmd_maybe_osd_create fail: '(22) Invalid argument': (22) Invalid > > >> > > > argument > > >> > > > 2018-08-17 19:12:03.004 7fd06e5a91c0 -1 osd.48 0 init unable to > > >> > > > update_crush_device_class: (22) Invalid argument > > >> > > > > > >> > > > > > >> > > > So i tried to add the osd into crushmap with "ceph osd crush add > > >> > > > osd.48 4.0 host=<Hostname>" same 22 error appears: Error EINVAL: (22) > > >> > > > Invalid argument. Trying to set device class it also fails with the > > >> > > > same error. > > >> > > > > > >> > > > If I manually add the OSD to crushmap. Crustool fails to compile the > > >> > > > map with errors. > > >> > > > _______________________________________________ > > >> > > > ceph-users mailing list > > >> > > > ceph-users@xxxxxxxxxxxxxx > > >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com