Re: Mimic osd fails to start.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 20, 2018 at 10:23 AM, Daznis <daznis@xxxxxxxxx> wrote:
> Hello,
>
> It appears that something is horribly wrong with the cluster itself. I
> can't create or add any new osds to it at all.

Have you added new monitors? Or replaced monitors? I would check that
all your versions match, something seems to be expecting different
versions.

The "Invalid argument" problem is a common thing we see when that happens.

Something that might help a bit here is if you run ceph-medic against
your cluster:

http://docs.ceph.com/ceph-medic/master/



> On Mon, Aug 20, 2018 at 11:04 AM Daznis <daznis@xxxxxxxxx> wrote:
>>
>> Hello,
>>
>>
>> Zapping the journal didn't help. I tried to create the journal after
>> zapping it. Also failed. I'm not really sure why this happens.
>>
>> Looking at the monitor logs with 20/20 debug I'm seeing these errors:
>>
>> 2018-08-20 08:57:58.753 7f9d85934700  0 mon.mon02@1(peon) e4
>> handle_command mon_command({"prefix": "osd crush set-device-class",
>> "class": "ssd", "ids": ["48"]} v 0) v1
>> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=osd
>> command=osd crush set-device-class read write on cap allow profile osd
>> 2018-08-20 08:57:58.753 7f9d85934700 20  allow so far , doing grant
>> allow profile osd
>> 2018-08-20 08:57:58.753 7f9d85934700 20  match
>> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
>> _allowed_command capable
>> 2018-08-20 08:57:58.753 7f9d85934700  0 log_channel(audit) log [INF] :
>> from='osd.48 10.24.52.17:6800/153683' entity='osd.48' cmd=[{"prefix":
>> "osd crush set-device-class", "class": "ssd", "ids": ["48"]}]:
>> dispatch
>> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).osd e46327
>> preprocess_query mon_command({"prefix": "osd crush set-device-class",
>> "class": "ssd", "ids": ["48"]} v 0) v1 from osd.48
>> 10.24.52.17:6800/153683
>> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
>> forward_request 4 request mon_command({"prefix": "osd crush
>> set-device-class", "class": "ssd", "ids": ["48"]} v 0) v1 features
>> 4611087854031142907
>> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4
>> _ms_dispatch existing session 0x55b4ec482a80 for mon.1
>> 10.24.52.11:6789/0
>> 2018-08-20 08:57:58.753 7f9d85934700 20 mon.mon02@1(peon) e4  caps allow *
>> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log
>> v10758065 preprocess_query log(1 entries from seq 4 at 2018-08-20
>> 08:57:58.755306) v1 from mon.1 10.24.52.11:6789/0
>> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon).log
>> v10758065 preprocess_log log(1 entries from seq 4 at 2018-08-20
>> 08:57:58.755306) v1 from mon.1
>> 2018-08-20 08:57:58.753 7f9d85934700 20 is_capable service=log
>> command= write on cap allow *
>> 2018-08-20 08:57:58.753 7f9d85934700 20  allow so far , doing grant allow *
>> 2018-08-20 08:57:58.753 7f9d85934700 20  allow all
>> 2018-08-20 08:57:58.753 7f9d85934700 10 mon.mon02@1(peon) e4
>> forward_request 5 request log(1 entries from seq 4 at 2018-08-20
>> 08:57:58.755306) v1 features 4611087854031142907
>> 2018-08-20 08:57:58.754 7f9d85934700 20 mon.mon02@1(peon) e4
>> _ms_dispatch existing session 0x55b4ec4828c0 for mon.0
>> 10.24.52.10:6789/0
>> 2018-08-20 08:57:58.754 7f9d85934700 20 mon.mon02@1(peon) e4  caps allow *
>> 2018-08-20 08:57:58.754 7f9d85934700 20 is_capable service=mon
>> command= read on cap allow *
>> 2018-08-20 08:57:58.754 7f9d85934700 20  allow so far , doing grant allow *
>> 2018-08-20 08:57:58.754 7f9d85934700 20  allow all
>> 2018-08-20 08:57:58.754 7f9d85934700 20 is_capable service=mon
>> command= exec on cap allow *
>> 2018-08-20 08:57:58.754 7f9d85934700 20  allow so far , doing grant allow *
>> 2018-08-20 08:57:58.754 7f9d85934700 20  allow all
>> 2018-08-20 08:57:58.754 7f9d85934700 10 mon.mon02@1(peon) e4
>> handle_route mon_command_ack([{"prefix": "osd crush set-device-class",
>> "class": "ssd", "ids": ["48"]}]=-22 (22) Invalid argument v46327) v1
>> to unknown.0 -
>> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4
>> ms_handle_reset 0x55b4ecf4b200 10.24.52.17:6800/153683
>> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4
>> reset/close on session osd.48 10.24.52.17:6800/153683
>> 2018-08-20 08:57:58.785 7f9d85934700 10 mon.mon02@1(peon) e4
>> remove_session 0x55b4ecf86380 osd.48 10.24.52.17:6800/153683 features
>> 0x3ffddff8ffa4fffb
>> 2018-08-20 08:57:58.828 7f9d85934700 20 mon.mon02@1(peon) e4
>> _ms_dispatch existing session 0x55b4ec4828c0 for mon.0
>> 10.24.52.10:6789/0
>> On Sat, Aug 18, 2018 at 7:54 PM Daznis <daznis@xxxxxxxxx> wrote:
>> >
>> > Hello,
>> >
>> > not sure about it. I assumed ceph-deploy would do it with the
>> > "--zap-disk" flag defined. I will try it on Monday and report the
>> > progress.
>> > On Sat, Aug 18, 2018 at 3:02 PM Alfredo Deza <adeza@xxxxxxxxxx> wrote:
>> > >
>> > > On Fri, Aug 17, 2018 at 7:05 PM, Daznis <daznis@xxxxxxxxx> wrote:
>> > > > Hello,
>> > > >
>> > > >
>> > > > I have replace one of our failed OSD drives and recreated a new osd
>> > > > with ceph-deploy and it failes to start.
>> > >
>> > > Is it possible you haven't zapped the journal on nvme0n1p13 ?
>> > >
>> > >
>> > >
>> > > >
>> > > > Command: ceph-deploy --overwrite-conf osd create --filestore
>> > > > --zap-disk --data /dev/bcache0 --journal /dev/nvme0n1p13 <Hostname>
>> > > >
>> > > > Output off ceph-deploy:
>> > > > [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
>> > > > [ceph_deploy.cli][INFO  ] Invoked (2.0.1): /usr/bin/ceph-deploy
>> > > > --overwrite-conf osd create --filestore --zap-disk --data /dev/bcache0
>> > > > --journal /dev/nvme0n1p13 <Hostname>
>> > > > [ceph_deploy.cli][INFO  ] ceph-deploy options:
>> > > > [ceph_deploy.cli][INFO  ]  verbose                       : False
>> > > > [ceph_deploy.cli][INFO  ]  bluestore                     : None
>> > > > [ceph_deploy.cli][INFO  ]  cd_conf                       :
>> > > > <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f8622160bd8>
>> > > > [ceph_deploy.cli][INFO  ]  cluster                       : ceph
>> > > > [ceph_deploy.cli][INFO  ]  fs_type                       : xfs
>> > > > [ceph_deploy.cli][INFO  ]  block_wal                     : None
>> > > > [ceph_deploy.cli][INFO  ]  default_release               : False
>> > > > [ceph_deploy.cli][INFO  ]  username                      : None
>> > > > [ceph_deploy.cli][INFO  ]  journal                       : /dev/nvme0n1p13
>> > > > [ceph_deploy.cli][INFO  ]  subcommand                    : create
>> > > > [ceph_deploy.cli][INFO  ]  host                          : <Hostname>
>> > > > [ceph_deploy.cli][INFO  ]  filestore                     : True
>> > > > [ceph_deploy.cli][INFO  ]  func                          : <function
>> > > > osd at 0x7f8622194848>
>> > > > [ceph_deploy.cli][INFO  ]  ceph_conf                     : None
>> > > > [ceph_deploy.cli][INFO  ]  zap_disk                      : True
>> > > > [ceph_deploy.cli][INFO  ]  data                          : /dev/bcache0
>> > > > [ceph_deploy.cli][INFO  ]  block_db                      : None
>> > > > [ceph_deploy.cli][INFO  ]  dmcrypt                       : False
>> > > > [ceph_deploy.cli][INFO  ]  overwrite_conf                : True
>> > > > [ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               :
>> > > > /etc/ceph/dmcrypt-keys
>> > > > [ceph_deploy.cli][INFO  ]  quiet                         : False
>> > > > [ceph_deploy.cli][INFO  ]  debug                         : False
>> > > > [ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data
>> > > > device /dev/bcache0
>> > > > [<Hostname>][DEBUG ] connected to host: <Hostname>
>> > > > [<Hostname>][DEBUG ] detect platform information from remote host
>> > > > [<Hostname>][DEBUG ] detect machine type
>> > > > [<Hostname>][DEBUG ] find the location of an executable
>> > > > [ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.5.1804 Core
>> > > > [ceph_deploy.osd][DEBUG ] Deploying osd to <Hostname>
>> > > > [<Hostname>][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
>> > > > [<Hostname>][DEBUG ] find the location of an executable
>> > > > [ceph_deploy.osd][WARNIN] zapping is no longer supported when preparing
>> > > > [<Hostname>][INFO  ] Running command: /usr/sbin/ceph-volume --cluster
>> > > > ceph lvm create --filestore --data /dev/bcache0 --journal
>> > > > /dev/nvme0n1p13
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph --cluster ceph --name
>> > > > client.bootstrap-osd --keyring
>> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
>> > > > a503ae5e-b5b9-40d7-b8b3-194f15e52082
>> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/vgcreate --force --yes
>> > > > ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162 /dev/bcache0
>> > > > [<Hostname>][DEBUG ]  stdout: Physical volume "/dev/bcache0"
>> > > > successfully created.
>> > > > [<Hostname>][DEBUG ]  stdout: Volume group
>> > > > "ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162" successfully created
>> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/lvcreate --yes -l
>> > > > 100%FREE -n osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
>> > > > ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162
>> > > > [<Hostname>][DEBUG ]  stdout: Logical volume
>> > > > "osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082" created.
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool --gen-print-key
>> > > > [<Hostname>][DEBUG ] Running command: /usr/sbin/mkfs -t xfs -f -i
>> > > > size=2048 /dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
>> > > > [<Hostname>][DEBUG ]  stdout:
>> > > > meta-data=/dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
>> > > > isize=2048   agcount=4, agsize=244154112 blks
>> > > > [<Hostname>][DEBUG ]          =                       sectsz=512
>> > > > attr=2, projid32bit=1
>> > > > [<Hostname>][DEBUG ]          =                       crc=1
>> > > > finobt=0, sparse=0
>> > > > [<Hostname>][DEBUG ] data     =                       bsize=4096
>> > > > blocks=976616448, imaxpct=5
>> > > > [<Hostname>][DEBUG ]          =                       sunit=0      swidth=0 blks
>> > > > [<Hostname>][DEBUG ] naming   =version 2              bsize=4096
>> > > > ascii-ci=0 ftype=1
>> > > > [<Hostname>][DEBUG ] log      =internal log           bsize=4096
>> > > > blocks=476863, version=2
>> > > > [<Hostname>][DEBUG ]          =                       sectsz=512
>> > > > sunit=0 blks, lazy-count=1
>> > > > [<Hostname>][DEBUG ] realtime =none                   extsz=4096
>> > > > blocks=0, rtextents=0
>> > > > [<Hostname>][DEBUG ] Running command: /bin/mount -t xfs -o
>> > > > rw,noatime,inode64,noquota,nodiratime,logbufs=8,logbsize=256k,attr2
>> > > > /dev/ceph-a1ffe5bb-6f06-49c6-8aec-e3eb3a311162/osd-data-a503ae5e-b5b9-40d7-b8b3-194f15e52082
>> > > > /var/lib/ceph/osd/ceph-48
>> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ln -s /dev/nvme0n1p13
>> > > > /var/lib/ceph/osd/ceph-48/journal
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph --cluster ceph --name
>> > > > client.bootstrap-osd --keyring
>> > > > /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
>> > > > /var/lib/ceph/osd/ceph-48/activate.monmap
>> > > > [<Hostname>][DEBUG ]  stderr: got monmap epoch 4
>> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13
>> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph
>> > > > /var/lib/ceph/osd/ceph-48/
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-osd --cluster ceph
>> > > > --osd-objectstore filestore --mkfs -i 48 --monmap
>> > > > /var/lib/ceph/osd/ceph-48/activate.monmap --keyfile - --osd-data
>> > > > /var/lib/ceph/osd/ceph-48/ --osd-journal
>> > > > /var/lib/ceph/osd/ceph-48/journal --osd-uuid
>> > > > a503ae5e-b5b9-40d7-b8b3-194f15e52082 --setuser ceph --setgroup ceph
>> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.067 7f1f62c471c0 -1
>> > > > auth: unable to find a keyring on /var/lib/ceph/osd/ceph-48//keyring:
>> > > > (2) No such file or directory
>> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.188 7f1f62c471c0 -1
>> > > > journal read_header error decoding journal header
>> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.198 7f1f62c471c0 -1
>> > > > journal do_read_entry(4096): bad header magic
>> > > > [<Hostname>][DEBUG ]  stderr: 2018-08-17 18:23:26.198 7f1f62c471c0 -1
>> > > > journal do_read_entry(4096): bad header magic
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ceph-authtool
>> > > > /var/lib/ceph/osd/ceph-48/keyring --create-keyring --name osd.48
>> > > > --add-key AQB39nZbiXJBMBAAOb9cxepxJrflhSNADuVNSQ==
>> > > > [<Hostname>][DEBUG ]  stdout: creating /var/lib/ceph/osd/ceph-48/keyring
>> > > > [<Hostname>][DEBUG ] added entity osd.48 auth auth(auid =
>> > > > 18446744073709551615 key=AQB39nZbiXJBMBAAOb9cxepxJrflhSNADuVNSQ== with
>> > > > 0 caps)
>> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph
>> > > > /var/lib/ceph/osd/ceph-48/keyring
>> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm prepare successful for: /dev/bcache0
>> > > > [<Hostname>][DEBUG ] Running command: /bin/ln -snf /dev/nvme0n1p13
>> > > > /var/lib/ceph/osd/ceph-48/journal
>> > > > [<Hostname>][DEBUG ] Running command: /bin/chown -R ceph:ceph /dev/nvme0n1p13
>> > > > [<Hostname>][DEBUG ] Running command: /bin/systemctl enable
>> > > > ceph-volume@lvm-48-a503ae5e-b5b9-40d7-b8b3-194f15e52082
>> > > > [<Hostname>][DEBUG ]  stderr: Created symlink from
>> > > > /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-48-a503ae5e-b5b9-40d7-b8b3-194f15e52082.service
>> > > > to /usr/lib/systemd/system/ceph-volume@.service.
>> > > > [<Hostname>][DEBUG ] Running command: /bin/systemctl start ceph-osd@48
>> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm activate successful for osd ID: 48
>> > > > [<Hostname>][DEBUG ] --> ceph-volume lvm create successful for: /dev/bcache0
>> > > > [<Hostname>][INFO  ] checking OSD status...
>> > > > [<Hostname>][DEBUG ] find the location of an executable
>> > > > [<Hostname>][INFO  ] Running command: /bin/ceph --cluster=ceph osd
>> > > > stat --format=json
>> > > >
>> > > >
>> > > > When OSD service is starting I'm getting these errors:
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 _get_class not permitted to load lua
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  1 osd.48 0 warning: got an error
>> > > > loading one or more classes: (1) Operation not permitted
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 crush map has
>> > > > features 288232575208783872, adjusting msgr requires for clients
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 crush map has
>> > > > features 288232575208783872 was 8705, adjusting msgr requires for mons
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 crush map has
>> > > > features 288232575208783872, adjusting msgr requires for osds
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 load_pgs
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 load_pgs opened 0 pgs
>> > > > 2018-08-17 19:12:02.998 7fd06e5a91c0  0 osd.48 0 using
>> > > > weightedpriority op queue with priority op cut off at 64.
>> > > > 2018-08-17 19:12:02.999 7fd06e5a91c0 -1 osd.48 0 log_to_monitors {default=true}
>> > > > 2018-08-17 19:12:03.004 7fd06e5a91c0 -1 osd.48 0
>> > > > mon_cmd_maybe_osd_create fail: '(22) Invalid argument': (22) Invalid
>> > > > argument
>> > > > 2018-08-17 19:12:03.004 7fd06e5a91c0 -1 osd.48 0 init unable to
>> > > > update_crush_device_class: (22) Invalid argument
>> > > >
>> > > >
>> > > > So i tried to add the osd into crushmap with "ceph osd crush add
>> > > > osd.48 4.0 host=<Hostname>" same 22 error appears: Error EINVAL: (22)
>> > > > Invalid argument. Trying to set device class it also fails with the
>> > > > same error.
>> > > >
>> > > > If I manually add the OSD to crushmap. Crustool fails to compile the
>> > > > map with errors.
>> > > > _______________________________________________
>> > > > ceph-users mailing list
>> > > > ceph-users@xxxxxxxxxxxxxx
>> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux