Re: OSDs failing to start after host reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 29, 2018 at 10:55 AM, Andre Goree <andre@xxxxxxxxxx> wrote:
> On my OSD node that I built with ceph-ansible, the OSDs are failing to start
> after a reboot.

This is not uncommon for ceph-disk unfortunately, and one of the
reasons we have introduced ceph-volume. There are a few components
that can
cause this, you may find that rebooting your node will yield different
results, some times other OSDs will come up (or all of them even!)

If you search the tracker, or even this mailing list, you will see
this is nothing new.

ceph-ansible has the ability to deploy using ceph-volume, which
doesn't suffer from the same caveats, you might want to try it out (if
possible)


>
>
> Errors on boot:
>
>     ~# systemctl status ceph-disk@dev-nvme0n1p16.service
>     ● ceph-disk@dev-nvme0n1p16.service - Ceph disk activation:
> /dev/nvme0n1p16
>     Loaded: loaded (/lib/systemd/system/ceph-disk@.service; static; vendor
> preset: enabled)
>     Active: failed (Result: exit-code) since Mon 2018-01-29 10:14:21 EST;
> 5min ago
>     Process: 2819 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
> /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
> --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
>     Main PID: 2819 (code=exited, status=1/FAILURE)
>
>     Jan 29 10:14:21 osd-08 sh[2819]: main(sys.argv[1:])
>     Jan 29 10:14:21 osd-08 sh[2819]: File
> "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 5674, in main
>     Jan 29 10:14:21 osd-08 sh[2819]: args.func(args)
>     Jan 29 10:14:21 osd-08 sh[2819]: File
> "/usr/lib/python2.7/dist-packages/ceph_disk/main.py", line 4874, in
> main_trigger
>     Jan 29 10:14:21 osd-08 sh[2819]: raise Error('return code ' + str(ret))
>     Jan 29 10:14:21 osd-08 sh[2819]: ceph_disk.main.Error: Error: return
> code 1
>     Jan 29 10:14:21 osd-08 systemd[1]: ceph-disk@dev-nvme0n1p16.service:
> Main process exited, code=exited, status=1/FAILURE
>     Jan 29 10:14:21 osd-08 systemd[1]: Failed to start Ceph disk activation:
> /dev/nvme0n1p16.
>     Jan 29 10:14:21 osd-08 systemd[1]: ceph-disk@dev-nvme0n1p16.service:
> Unit entered failed state.
>     Jan 29 10:14:21 osd-08 systemd[1]: ceph-disk@dev-nvme0n1p16.service:
> Failed with result 'exit-code'.
>
> When I try to run "systemctl start ceph-disk@dev-nvme0n1p16.service"
> manually, the command appears to just hang:
>
>     Jan 29 10:33:47 osd-08 systemd[1]: Starting Ceph disk activation:
> /dev/nvme0n1p16...
>     Jan 29 10:33:47 osd-08 sh[8093]:
> /usr/lib/python2.7/dist-packages/ceph_disk/main.py:5653: UserWarning:
>     Jan 29 10:33:47 osd-08 sh[8093]:
> *******************************************************************************
>     Jan 29 10:33:47 osd-08 sh[8093]: This tool is now deprecated in favor of
> ceph-volume.
>     Jan 29 10:33:47 osd-08 sh[8093]: It is recommended to use ceph-volume
> for OSD deployments. For details see:
>     Jan 29 10:33:47 osd-08 sh[8093]:
> http://docs.ceph.com/docs/master/ceph-volume/#migrating
>     Jan 29 10:33:47 osd-08 sh[8093]:
> *******************************************************************************
>     Jan 29 10:33:47 osd-08 sh[8093]: warnings.warn(DEPRECATION_WARNING)
>     Jan 29 10:33:47 osd-08 sh[8093]: main_trigger: main_trigger:
> Namespace(cluster='ceph', dev='/dev/nvme0n1p16', dmcrypt=None,
> dmcrypt_key_dir='/etc/ceph/dmcrypt-keys', func=<function main_trigger at
> 0x7efdbf275a28>, log_stdout=True, prepend_to_path='/usr/bin',
> prog='ceph-disk', setgroup=None, setuser=None, statedir='/var/lib/ceph',
> sync=True, sysconfdir='/etc/ceph', verbose=True)
>     Jan 29 10:33:47 osd-08 sh[8093]: command: Running command: /sbin/init
> --version
>     Jan 29 10:33:47 osd-08 sh[8093]: command_check_call: Running command:
> /bin/chown ceph:ceph /dev/nvme0n1p16
>     Jan 29 10:33:47 osd-08 sh[8093]: command: Running command: /sbin/blkid
> -o udev -p /dev/nvme0n1p16
>     Jan 29 10:33:47 osd-08 sh[8093]: command: Running command: /sbin/blkid
> -o udev -p /dev/nvme0n1p16
>     Jan 29 10:33:47 osd-08 sh[8093]: main_trigger: trigger /dev/nvme0n1p16
> parttype 86a32090-3647-40b9-bbbd-38d8c573aa86 uuid
> 4bc8ecb2-8c83-4006-9d15-09d213952077
>     Jan 29 10:33:47 osd-08 sh[8093]: command: Running command:
> /usr/sbin/ceph-disk --verbose activate-block --dmcrypt /dev/nvme0n1p16
>
> I've searched around and found that perhaps my permissions are off, but all
> appears correct on the disks controlled by ceph:
>
>     brw-rw---- 1 root disk 8, 0 Jan 29 10:14 /dev/sda
>     brw-rw---- 1 ceph ceph 8, 1 Jan 29 10:14 /dev/sda1
>     brw-rw---- 1 ceph ceph 8, 2 Jan 29 10:14 /dev/sda2
>     brw-rw---- 1 ceph ceph 8, 5 Jan 29 10:14 /dev/sda5
>     brw-rw---- 1 root disk 8, 16 Jan 29 10:14 /dev/sdb
>     brw-rw---- 1 ceph ceph 8, 17 Jan 29 10:14 /dev/sdb1
>     brw-rw---- 1 ceph ceph 8, 18 Jan 29 10:14 /dev/sdb2
>     brw-rw---- 1 ceph ceph 8, 21 Jan 29 10:14 /dev/sdb5
>     brw-rw---- 1 root disk 8, 32 Jan 29 10:14 /dev/sdc
>     brw-rw---- 1 ceph ceph 8, 33 Jan 29 10:14 /dev/sdc1
>     brw-rw---- 1 ceph ceph 8, 34 Jan 29 10:14 /dev/sdc2
>     brw-rw---- 1 ceph ceph 8, 37 Jan 29 10:14 /dev/sdc5
>     brw-rw---- 1 root disk 8, 48 Jan 29 10:14 /dev/sdd
>     brw-rw---- 1 ceph ceph 8, 49 Jan 29 10:14 /dev/sdd1
>     brw-rw---- 1 ceph ceph 8, 50 Jan 29 10:14 /dev/sdd2
>     brw-rw---- 1 ceph ceph 8, 53 Jan 29 10:14 /dev/sdd5
>     brw-rw---- 1 root disk 8, 64 Jan 29 10:14 /dev/sde
>     brw-rw---- 1 ceph ceph 8, 65 Jan 29 10:14 /dev/sde1
>     brw-rw---- 1 ceph ceph 8, 66 Jan 29 10:14 /dev/sde2
>     brw-rw---- 1 ceph ceph 8, 69 Jan 29 10:14 /dev/sde5
>     brw-rw---- 1 root disk 8, 80 Jan 29 10:14 /dev/sdf
>     brw-rw---- 1 ceph ceph 8, 81 Jan 29 10:14 /dev/sdf1
>     brw-rw---- 1 ceph ceph 8, 82 Jan 29 10:14 /dev/sdf2
>     brw-rw---- 1 ceph ceph 8, 85 Jan 29 10:14 /dev/sdf5
>     brw-rw---- 1 root disk 8, 96 Jan 29 10:14 /dev/sdg
>     brw-rw---- 1 ceph ceph 8, 97 Jan 29 10:14 /dev/sdg1
>     brw-rw---- 1 ceph ceph 8, 98 Jan 29 10:14 /dev/sdg2
>     brw-rw---- 1 ceph ceph 8, 101 Jan 29 10:14 /dev/sdg5
>     brw-rw---- 1 root disk 8, 112 Jan 29 10:14 /dev/sdh
>     brw-rw---- 1 ceph ceph 8, 113 Jan 29 10:14 /dev/sdh1
>     brw-rw---- 1 ceph ceph 8, 114 Jan 29 10:14 /dev/sdh2
>     brw-rw---- 1 ceph ceph 8, 117 Jan 29 10:14 /dev/sdh5
>     brw-rw---- 1 root disk 8, 128 Jan 29 10:14 /dev/sdi
>     brw-rw---- 1 root disk 8, 129 Jan 29 10:14 /dev/sdi1
>     brw-rw---- 1 root disk 8, 130 Jan 29 10:14 /dev/sdi2
>     brw-rw---- 1 root disk 8, 144 Jan 29 10:14 /dev/sdj
>     brw-rw---- 1 root disk 8, 145 Jan 29 10:14 /dev/sdj1
>     brw-rw---- 1 root disk 8, 146 Jan 29 10:14 /dev/sdj2
>     brw-rw---- 1 root disk 8, 160 Jan 29 10:14 /dev/sdk
>     brw-rw---- 1 ceph ceph 8, 161 Jan 29 10:14 /dev/sdk1
>     brw-rw---- 1 ceph ceph 8, 162 Jan 29 10:14 /dev/sdk2
>     brw-rw---- 1 ceph ceph 8, 165 Jan 29 10:14 /dev/sdk5
>     brw-rw---- 1 root disk 8, 176 Jan 29 10:14 /dev/sdl
>     brw-rw---- 1 ceph ceph 8, 177 Jan 29 10:14 /dev/sdl1
>     brw-rw---- 1 ceph ceph 8, 178 Jan 29 10:14 /dev/sdl2
>     brw-rw---- 1 ceph ceph 8, 181 Jan 29 10:14 /dev/sdl5
>     brw-rw---- 1 root disk 8, 192 Jan 29 10:14 /dev/sdm
>     brw-rw---- 1 ceph ceph 8, 193 Jan 29 10:14 /dev/sdm1
>     brw-rw---- 1 ceph ceph 8, 194 Jan 29 10:14 /dev/sdm2
>     brw-rw---- 1 ceph ceph 8, 197 Jan 29 10:14 /dev/sdm5
>     brw-rw---- 1 root disk 8, 208 Jan 29 10:14 /dev/sdn
>     brw-rw---- 1 ceph ceph 8, 209 Jan 29 10:14 /dev/sdn1
>     brw-rw---- 1 ceph ceph 8, 210 Jan 29 10:14 /dev/sdn2
>     brw-rw---- 1 ceph ceph 8, 213 Jan 29 10:14 /dev/sdn5
>
>     crw------- 1 root root 247, 0 Jan 29 10:14 /dev/nvme0
>     brw-rw---- 1 root disk 259, 0 Jan 29 10:14 /dev/nvme0n1
>     brw-rw---- 1 ceph ceph 259, 1 Jan 29 10:14 /dev/nvme0n1p1
>     brw-rw---- 1 ceph ceph 259, 10 Jan 29 10:14 /dev/nvme0n1p10
>     brw-rw---- 1 ceph ceph 259, 11 Jan 29 10:14 /dev/nvme0n1p11
>     brw-rw---- 1 ceph ceph 259, 12 Jan 29 10:14 /dev/nvme0n1p12
>     brw-rw---- 1 ceph ceph 259, 13 Jan 29 10:14 /dev/nvme0n1p13
>     brw-rw---- 1 ceph ceph 259, 14 Jan 29 10:14 /dev/nvme0n1p14
>     brw-rw---- 1 ceph ceph 259, 15 Jan 29 10:14 /dev/nvme0n1p15
>     brw-rw---- 1 ceph ceph 259, 16 Jan 29 10:14 /dev/nvme0n1p16
>     brw-rw---- 1 ceph ceph 259, 17 Jan 29 10:14 /dev/nvme0n1p17
>     brw-rw---- 1 ceph ceph 259, 18 Jan 29 10:14 /dev/nvme0n1p18
>     brw-rw---- 1 ceph ceph 259, 19 Jan 29 10:14 /dev/nvme0n1p19
>     brw-rw---- 1 ceph ceph 259, 2 Jan 29 10:14 /dev/nvme0n1p2
>     brw-rw---- 1 ceph ceph 259, 20 Jan 29 10:14 /dev/nvme0n1p20
>     brw-rw---- 1 ceph ceph 259, 21 Jan 29 10:14 /dev/nvme0n1p21
>     brw-rw---- 1 ceph ceph 259, 22 Jan 29 10:14 /dev/nvme0n1p22
>     brw-rw---- 1 ceph ceph 259, 23 Jan 29 10:14 /dev/nvme0n1p23
>     brw-rw---- 1 ceph ceph 259, 24 Jan 29 10:14 /dev/nvme0n1p24
>     brw-rw---- 1 ceph ceph 259, 3 Jan 29 10:14 /dev/nvme0n1p3
>     brw-rw---- 1 ceph ceph 259, 4 Jan 29 10:14 /dev/nvme0n1p4
>     brw-rw---- 1 ceph ceph 259, 5 Jan 29 10:14 /dev/nvme0n1p5
>     brw-rw---- 1 ceph ceph 259, 6 Jan 29 10:14 /dev/nvme0n1p6
>     brw-rw---- 1 ceph ceph 259, 7 Jan 29 10:14 /dev/nvme0n1p7
>     brw-rw---- 1 ceph ceph 259, 8 Jan 29 10:14 /dev/nvme0n1p8
>     brw-rw---- 1 ceph ceph 259, 9 Jan 29 10:14 /dev/nvme0n1p9
>
> Any ideas? I can provide more logs if necessary.
>
>
> --
> Andre Goree
> -=-=-=-=-=-
> Email     - andre at drenet.net
> Website   - http://blog.drenet.net
> PGP key   - http://www.drenet.net/pubkey.html
> -=-=-=-=-=-
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux