Re: fyi: Luminous 12.2.7 pulled wrong osd disk, resulted in node down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 1, 2018 at 10:38 PM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx> wrote:
>
>
> Today we pulled the wrong disk from a ceph node. And that made the whole
> node go down/be unresponsive. Even to a simple ping. I cannot find to
> much about this in the log files. But I expect that the
> /usr/bin/ceph-osd process caused a kernel panic.

That would most likely be a kernel bug. Someone would probably need to
look at a vmcore to work out what happened.

>
> Linux c01 3.10.0-693.11.1.el7.x86_64
> CentOS Linux release 7.4.1708 (Core)
> libcephfs2-12.2.7-0.el7.x86_64
> ceph-mon-12.2.7-0.el7.x86_64
> nfs-ganesha-ceph-2.6.1-0.1.el7.x86_64
> ceph-selinux-12.2.7-0.el7.x86_64
> ceph-osd-12.2.7-0.el7.x86_64
> ceph-mgr-12.2.7-0.el7.x86_64
> ceph-12.2.7-0.el7.x86_64
> python-cephfs-12.2.7-0.el7.x86_64
> ceph-common-12.2.7-0.el7.x86_64
> ceph-mds-12.2.7-0.el7.x86_64
> ceph-radosgw-12.2.7-0.el7.x86_64
> ceph-base-12.2.7-0.el7.x86_64
>
> Aug  1 11:01:01 c02 systemd: Started Session 8331 of user root.
> Aug  1 11:01:01 c02 systemd: Starting Session 8331 of user root.
> Aug  1 11:01:01 c02 systemd: Starting Session 8331 of user root.
> Aug  1 11:03:08 c03 kernel: XFS (sdb1): xfs_do_force_shutdown(0x2)
> called from line 1200 of file fs/xfs/xfs_log.c.  Return address =
> 0xffffffffc0232e60
> Aug  1 11:03:08 c03 kernel: XFS (sdb1): xfs_do_force_shutdown(0x2)
> called from line 1200 of file fs/xfs/xfs_log.c.  Return address =
> 0xffffffffc0232e60
> Aug  1 11:03:33 c03 kernel: XFS (sdf1): xfs_do_force_shutdown(0x2)
> called from line 1200 of file fs/xfs/xfs_log.c.  Return address =
> 0xffffffffc0232e60
> Aug  1 11:03:33 c03 kernel: XFS (sdf1): xfs_do_force_shutdown(0x2)
> called from line 1200 of file fs/xfs/xfs_log.c.  Return address =
> 0xffffffffc0232e60
> Aug  1 11:03:34 c02 kernel: libceph: osd5 down
> Aug  1 11:03:34 c02 kernel: libceph: osd5 down
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656719 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6816 osd.12
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656719 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6816 osd.12
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656746 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6812 osd.14
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656761 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6804 osd.15
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656773 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6814 osd.16
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656746 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6812 osd.14
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656761 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6804 osd.15
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:04 c02 ceph-osd: 2018-08-01 11:05:04.656773 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6814 osd.16
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:44.656717)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657034 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6816 osd.12
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657034 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6816 osd.12
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657067 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6812 osd.14
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657079 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6804 osd.15
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657089 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6814 osd.16
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657067 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6812 osd.14
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657079 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6804 osd.15
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:05 c02 ceph-osd: 2018-08-01 11:05:05.657089 7f1f1e764700 -1
> osd.9 22452 heartbeat_check: no reply from 192.168.10.113:6814 osd.16
> since back 2018-08-01 11:04:44.365869 front 2018-08-01 11:04:44.365869
> (cutoff 2018-08-01 11:04:45.657031)
> Aug  1 11:05:06 c02 ceph-osd: 2018-08-01 11:05:06.420853 7f9e2612b700 -1
> osd.20 22452 heartbeat_check: no reply from 192.168.10.113:6810 osd.13
> since back 2018-08-01 11:04:45.894147 front 2018-08-01 11:04:45.894147
> (cutoff 2018-08-01 11:04:46.420850)
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux