Re: Mimic osd fails to start.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,


Medic shows everything fine. Whole cluster is on the latest mimic
version. It was updated to mimic when stable version of mimic was
release and recently it was updated to "ceph version 13.2.1
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)". For some
reason one mgr service is running, but it's not connected to the
cluster.

Versions output:

{
    "mon": {
        "ceph version 13.2.1
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 3
    },
    "mgr": {
        "ceph version 13.2.1
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 2
    },
    "osd": {
        "ceph version 13.2.1
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 47
    },
    "mds": {},
    "overall": {
        "ceph version 13.2.1
(5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic (stable)": 52
    }
}

Medic output:
=======================  Starting remote check session  ========================
Version: 1.0.4    Cluster Name: "ceph"
Total hosts: [10]
OSDs:    5    MONs:    3     Clients:    0
MDSs:    0    RGWs:    0     MGRs:       2

================================================================================

---------- managers ----------
 mon03
 mon02
 mon01

------------ osds ------------
 node03
 node02
 node01
 node05
 node04

------------ mons ------------
 mon01
 mon03
 mon02

107 passed, on 11 hosts
On Mon, Aug 20, 2018 at 6:13 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>
> On Mon, Aug 20, 2018 at 10:23 AM, Daznis <daznis@xxxxxxxxx> wrote:
> > Hello,
> >
> > It appears that something is horribly wrong with the cluster itself. I
> > can't create or add any new osds to it at all.
>
> Have you added new monitors? Or replaced monitors? I would check that
> all your versions match, something seems to be expecting different
> versions.
>
> The "Invalid argument" problem is a common thing we see when that happens.
>
> Something that might help a bit here is if you run ceph-medic against
> your cluster:
>
> http://docs.ceph.com/ceph-medic/master/
>
>
>
> > On Mon, Aug 20, 2018 at 11:04 AM Daznis <daznis@xxxxxxxxx> wrote:
> >>
> >> Hello,
> >>
> >>
> >> Zapping the journal didn't help. I tried to create the journal after
> >> zapping it. Also failed. I'm not really sure why this happens.
> >>
> >> Looking at the monitor logs with 20/20 debug I'm seeing these errors:
> >>
> >> 2018-08-20 08:57:58.753 7f9d85934700  0 mon.mon02@1(peon) e4
> >> handle_command mon_command({"prefix": "osd crush set-device-class",
> >> "class": "ssd", "ids": ["48"]} v 0) v1
> >> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=osd
> >> command=osd crush set-device-class read write on cap allow profile osd
> >> 2018-08-20 08:57:58.753 7f9d85934700 20  allow so far , doing grant
> >> allow profile osd
> >> 2018-08-20 08:57:58.753 7f9d85934700 20  match
> >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
> >> _allowed_command capable
> >> 2018-08-20 08:57:58.753 7f9d85934700  0 log_channel(audit) log [INF] :
> >> from='osd.48 10.24.52.17:6800/153683' entity='osd.48' cmd=[{"prefix":
> >> "osd crush set-device-class", "class": "ssd", "ids": ["48"]}]:
> >> dispatch
> >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).osd e46327
> >> preprocess_query mon_command({"prefix": "osd crush set-device-class",
> >> "class": "ssd", "ids": ["48"]} v 0) v1 from osd.48
> >> 10.24.52.17:6800/153683
> >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
> >> forward_request 4 request mon_command({"prefix": "osd crush
> >> set-device-class", "class": "ssd", "ids": ["48"]} v 0) v1 features
> >> 4611087854031142907
> >> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4
> >> _ms_dispatch existing session 0x55b4ec482a80 for mon.1
> >> 10.24.52.11:6789/0
> >> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4  caps allow *
> >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log
> >> v10758065 preprocess_query log(1 entries from seq 4 at 2018-08-20
> >> 08:57:58.755306) v1 from mon.1 10.24.52.11:6789/0
> >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log
> >> v10758065 preprocess_log log(1 entries from seq 4 at 2018-08-20
> >> 08:57:58.755306) v1 from mon.1
> >> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=log
> >> command= write on cap allow *
> >> 2018-08-20 08:57:58.753 7f9d85934700 20  allow so far , doing grant allow *
> >> 2018-08-20 08:57:58.753 7f9d85934700 20  allow all
> >> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
> >> forward_request 5 request log(1 entries from seq 4 at 2018-08-20
> >> 08:57:58.755306) v1 features 4611087854031142907
> >> 2018-08-20 08:57:58.754 7f9d85934700 20 mon.mon02@1(peon) e4
> >> _ms_dispatch existing session 0x55b4ec4828c0 for mon.0
> >> 10.24.52.10:6789/0
> >> 2018-08-20 08:57:58.754 7f9d85934700 20 mon.mon02@1(peon) e4  caps allow *
> >> 2018-08-20 08:57:58.754 7f9d85934700 20 is_capable service=mon
> >> command= read on cap allow *
> >> 2018-08-20 08:57:58.754 7f9d85934700 20  allow so far , doing grant allow *
> >> 2018-08-20 08:57:58.754 7f9d85934700 20  allow all
> >> 2018-08-20 08:57:58.754 7f9d85934700 20 is_capable service=mon
> >> command= exec on cap allow *
> >> 2018-08-20 08:57:58.754 7f9d85934700 20  allow so far , doing grant allow *
> >> 2018-08-20 08:57:58.754 7f9d85934700 20  allow all
> >> 2018-08-20 08:57:58.754 7f9d85934700 10 mon.mon02@1(peon) e4
> >> handle_route mon_command_ack([{"prefix": "osd crush set-device-class",
> >> "class": "ssd", "ids": ["48"]}]=-22 (22) Invalid argument v46327) v1
> >> to unknown.0 -
> >> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4
> >> ms_handle_reset 0x55b4ecf4b200 10.24.52.17:6800/153683
> >> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4
> >> reset/close on session osd.48 10.24.52.17:6800/153683
> >> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4
> >> remove_session 0x55b4ecf86380 osd.48 10.24.52.17:6800/153683 features
> >> 0x3ffddff8ffa4fffb
> >> 2018-08-20 08:57:58.828 7f9d85934700 20 mon.mon02@1(peon) e4
> >> _ms_dispatch existing session 0x55b4ec4828c0 for mon.0
> >> 10.24.52.10:6789/0
> >> On Sat, Aug 18, 2018 at 7:54 PM Daznis <daznis@xxxxxxxxx> wrote:
> >> >
> >> > Hello,
> >> >
> >> > not sure about it. I assumed ceph-deploy would do it with the
> >> > "--zap-disk" flag defined. I will try it on Monday and report the
> >> > progress.
> >> > On Sat, Aug 18, 2018 at 3:02 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
> >> > >
> >> > > On Fri, Aug 17, 2018 at 7:05 PM, Daznis <daznis@xxxxxxxxx> wrote:
> >> > > > Hello,
> >> > > >
> >> > > >
> >> > > > I have replace one of our failed OSD drives and recreated a new osd
> >> > > > with ceph-deploy and it failes to start.
> >> > >
> >> > > Is it possible you haven't zapped the journal on nvme0n1p13 ?
> >> > >
> >> > >
> >> > >
> >> > > >
> >> > > > Command: ceph-deploy --overwrite-conf osd create --filestore
> >> > > > --zap-disk --data /dev/bcache0 --journal /dev/nvme0n1p13 <Hostname>
> >> > > >
> >> > > > Output off ceph-deploy:
> >> > > > [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
> >> > > > [ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy
> >> > > > --overwrite-conf osd create --filestore --zap-disk --data /dev/bcache0
> >> > > > --journal /dev/nvme0n1p13 <Hostname>
> >> > > > [ceph_deploy.cli][INFO  ] ceph-deploy options:
> >> > > > [ceph_deploy.cli][INFO  ]  verbose                       : False
> >> > > > [ceph_deploy.cli][INFO  ]  bluestore                     : None
> >> > > > [ceph_deploy.cli][INFO  ]  cd_conf                       :
> >> > > > <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f8622160bd8>
> >> > > > [ceph_deploy.cli][INFO  ]  cluster                       : ceph
> >> > > > [ceph_deploy.cli][INFO  ]  fs_type                       : xfs
> >> > > > [ceph_deploy.cli][INFO  ]  block_wal                     : None
> >> > > > [ceph_deploy.cli][INFO  ]  default_release               : False
> >> > > > [ceph_deploy.cli][INFO  ]  username                      : None
> >> > > > [ceph_deploy.cli][INFO  ]  journal                       : /dev/nvme0n1p13
> >> > > > [ceph_deploy.cli][INFO  ]  subcommand                    : create
> >> > > > [ceph_deploy.cli][INFO  ]  host                          : <Hostname>
> >> > > > [ceph_deploy.cli][INFO  ]  filestore                     : True
> >> > > > [ceph_deploy.cli][INFO  ]  func                          : <function
> >> > > > osd at 0x7f8622194848>
> >> > > > [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
> >> > > > [ceph_deploy.cli][INFO  ]  zap_disk                      : True
> >> > > > [ceph_deploy.cli][INFO  ]  data                          : /dev/bcache0
> >> > > > [ceph_deploy.cli][INFO  ]  block_db                      : None
> >> > > > [ceph_deploy.cli][INFO  ]  dmcrypt                       : False
> >> > > > [ceph_deploy.cli][INFO  ]  overwrite_conf                : True
> >> > > > [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               :
> >> > > > /etc/ceph/dmcrypt-keys
> >> > > > [ceph_deploy.cli][INFO  ]  quiet                         : False
> >> > > > [ceph_deploy.cli][INFO  ]  debug                         : False
> >> > > > [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data
> >> > > > device /dev/bcache0
> >> > > > [<Hostname>][DEBUG ] connected to host: <Hostname>
> >> > > > [<Hostname>][DEBUG ] detect platform information from remote host
> >> > > > [<Hostname>][DEBUG ] detect machine type
> >> > > > [<Hostname>][DEBUG ] find the location of an executable
> >> > > > [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
> >> > > > [ceph_deploy.osd][DEBUG ] Deploying osd to <Hostname>
> >> > > > [<Hostname>][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
> >> > > > [<Hostname>][DEBUG ] find the location of an executable
> >> > > > [ceph_deploy.osd][WARNIN] zapping is no longer supported when preparing
> >> > > > [<Hostname>][INFO  ] Running command: /usr/sbin/ceph-volume --cluster
> >> > > > ceph lvm create --filestore --data /dev/bcache0 --journal
> >> > > > /dev/nvme0n1p13
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph --cluster ceph --name
> >> > > > client.bootstrap-osd --keyring
> >> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
> >> > > > a503ae5e-b5b9-40d7-b8b3-194f15e52082
> >> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/vgcreate --force --yes
> >> > > > ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162 /dev/bcache0
> >> > > > [<Hostname>][DEBUG ]  stdout: Physical volume "/dev/bcache0"
> >> > > > successfully created.
> >> > > > [<Hostname>][DEBUG ]  stdout: Volume group
> >> > > > "ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162" successfully created
> >> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/lvcreate --yes -l
> >> > > > 100%FREE -n osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
> >> > > > ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162
> >> > > > [<Hostname>][DEBUG ]  stdout: Logical volume
> >> > > > "osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082" created.
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
> >> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/mkfs -t xfs -f -i
> >> > > > size=2048 /dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
> >> > > > [<Hostname>][DEBUG ]  stdout:
> >> > > > meta-data=/dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
> >> > > > isize=2048   agcount=4, agsize=244154112 blks
> >> > > > [<Hostname>][DEBUG ]          =                       sectsz=512
> >> > > > attr=2, projid32bit=1
> >> > > > [<Hostname>][DEBUG ]          =                       crc=1
> >> > > > finobt=0, sparse=0
> >> > > > [<Hostname>][DEBUG ] data     =                       bsize=4096
> >> > > > blocks=976616448, imaxpct=5
> >> > > > [<Hostname>][DEBUG ]          =                       sunit=0      swidth=0 blks
> >> > > > [<Hostname>][DEBUG ] naming   =version 2              bsize=4096
> >> > > > ascii-ci=0 ftype=1
> >> > > > [<Hostname>][DEBUG ] log      =internal log           bsize=4096
> >> > > > blocks=476863, version=2
> >> > > > [<Hostname>][DEBUG ]          =                       sectsz=512
> >> > > > sunit=0 blks, lazy-count=1
> >> > > > [<Hostname>][DEBUG ] realtime =none                   extsz=4096
> >> > > > blocks=0, rtextents=0
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/mount -t xfs -o
> >> > > > rw,noatime,inode64,noquota,nodiratime,logbufs=8,logbsize=256k,attr2
> >> > > > /dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
> >> > > > /var/lib/ceph/osd/ceph-48
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ln -s /dev/nvme0n1p13
> >> > > > /var/lib/ceph/osd/ceph-48/journal
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph --cluster ceph --name
> >> > > > client.bootstrap-osd --keyring
> >> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
> >> > > > /var/lib/ceph/osd/ceph-48/activate.monmap
> >> > > > [<Hostname>][DEBUG ]  stderr: got monmap epoch 4
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph
> >> > > > /var/lib/ceph/osd/ceph-48/
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-osd --cluster ceph
> >> > > > --osd-objectstore filestore --mkfs -i 48 --monmap
> >> > > > /var/lib/ceph/osd/ceph-48/activate.monmap --keyfile - --osd-data
> >> > > > /var/lib/ceph/osd/ceph-48/ --osd-journal
> >> > > > /var/lib/ceph/osd/ceph-48/journal --osd-uuid
> >> > > > a503ae5e-b5b9-40d7-b8b3-194f15e52082 --setuser ceph --setgroup ceph
> >> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.067 7f1f62c471c0 -1
> >> > > > auth: unable to find a keyring on /var/lib/ceph/osd/ceph-48//keyring:
> >> > > > (2) No such file or directory
> >> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.188 7f1f62c471c0 -1
> >> > > > journal read_header error decoding journal header
> >> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.198 7f1f62c471c0 -1
> >> > > > journal do_read_entry(4096): bad header magic
> >> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.198 7f1f62c471c0 -1
> >> > > > journal do_read_entry(4096): bad header magic
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool
> >> > > > /var/lib/ceph/osd/ceph-48/keyring --create-keyring --name osd.48
> >> > > > --add-key AQB39nZbiXJBMBAAOb9cxepxJrflhSNADuVNSQ==
> >> > > > [<Hostname>][DEBUG ]  stdout: creating /var/lib/ceph/osd/ceph-48/keyring
> >> > > > [<Hostname>][DEBUG ] added entity osd.48 auth auth(auid =
> >> > > > 18446744073709551615 key=AQB39nZbiXJBMBAAOb9cxepxJrflhSNADuVNSQ== with
> >> > > > 0 caps)
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph
> >> > > > /var/lib/ceph/osd/ceph-48/keyring
> >> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/bcache0
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/ln -snf /dev/nvme0n1p13
> >> > > > /var/lib/ceph/osd/ceph-48/journal
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/systemctl enable
> >> > > > ceph-volume@lvm-48-a503ae5e-b5b9-40d7-b8b3-194f15e52082
> >> > > > [<Hostname>][DEBUG ]  stderr: Created symlink from
> >> > > > /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-48-a503ae5e-b5b9-40d7-b8b3-194f15e52082.service
> >> > > > to /usr/lib/systemd/system/ceph-volume@.service.
> >> > > > [<Hostname>][DEBUG ] Running command: /bin/systemctl start ceph-osd@48
> >> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 48
> >> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm create successful for: /dev/bcache0
> >> > > > [<Hostname>][INFO  ] checking OSD status...
> >> > > > [<Hostname>][DEBUG ] find the location of an executable
> >> > > > [<Hostname>][INFO  ] Running command: /bin/ceph --cluster=ceph osd
> >> > > > stat --format=json
> >> > > >
> >> > > >
> >> > > > When OSD service is starting I'm getting these errors:
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 _get_class not permitted to load lua
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  1 osd.48 0 warning: got an error
> >> > > > loading one or more classes: (1) Operation not permitted
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 crush map has
> >> > > > features 288232575208783872, adjusting msgr requires for clients
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 crush map has
> >> > > > features 288232575208783872 was 8705, adjusting msgr requires for mons
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 crush map has
> >> > > > features 288232575208783872, adjusting msgr requires for osds
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 load_pgs
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 load_pgs opened 0 pgs
> >> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 using
> >> > > > weightedpriority op queue with priority op cut off at 64.
> >> > > > 2018-08-17 19:12:02.999 7fd06e5a91c0 -1 osd.48 0 log_to_monitors {default=true}
> >> > > > 2018-08-17 19:12:03.004 7fd06e5a91c0 -1 osd.48 0
> >> > > > mon_cmd_maybe_osd_create fail: '(22) Invalid argument': (22) Invalid
> >> > > > argument
> >> > > > 2018-08-17 19:12:03.004 7fd06e5a91c0 -1 osd.48 0 init unable to
> >> > > > update_crush_device_class: (22) Invalid argument
> >> > > >
> >> > > >
> >> > > > So i tried to add the osd into crushmap with "ceph osd crush add
> >> > > > osd.48 4.0 host=<Hostname>" same 22 error appears: Error EINVAL: (22)
> >> > > > Invalid argument. Trying to set device class it also fails with the
> >> > > > same error.
> >> > > >
> >> > > > If I manually add the OSD to crushmap. Crustool fails to compile the
> >> > > > map with errors.
> >> > > > _______________________________________________
> >> > > > ceph-users mailing list
> >> > > > ceph-users@xxxxxxxxxxxxxx
> >> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux