Re: Recreating a purged OSD fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks like a known issue tracked by

http://tracker.ceph.com/issues/24423

http://tracker.ceph.com/issues/24599


Regards,

Igor


On 6/27/2018 9:40 AM, Steffen Winther Sørensen wrote:
List,

Had a failed disk behind an OSD in a Mimic Cluster 13.2.0, so I tried following the doc on removal of an OSD.

I did:

# ceph osd crush reweight osd.19 0
waited for rebalancing to finish and cont.:
# ceph osd out 19
# systemctl stop ceph-osd@19
# ceph osd purge 19 --yes-i-really-mean-it

verified that osd.19 was out of map w/ ceph osd tree

Still found this tmpfs mounted though to my surprise:
tmpfs                    7.8G   48K  7.8G   1% /var/lib/ceph/osd/ceph-19

Replaced the failed drive and then attempted:

# ceph-volume lvm zap /dev/sdh
# ceph-volume lvm create --osd-id 19 --data /dev/sdh
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring osd tree -f json
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 5352d594-aa19-4147-a884-ca
2c5775aa1b
Running command: /usr/sbin/vgcreate --force --yes ceph-a2ebf47b-fa4a-43ce-b087-1
2dbafb5796e /dev/sdh
 stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
 or rejected by a filter.
 stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
 or rejected by a filter.
 stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-
408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 0: Input/output error
  /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-408e-a4
f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775408640: Input/output error
  /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-408e-a4
f9-1e1f62d88f2d: read failed after 0 of 4096 at 146775465984: Input/output error
 stderr: /dev/ceph-a6541e3f-0a7f-4268-823c-668c515b5edc/osd-block-efae9323-b934-
408e-a4f9-1e1f62d88f2d: read failed after 0 of 4096 at 4096: Input/output error
 stderr: WARNING: Device for PV CdiFOZ-n89Z-G5EF-JBBV-GFfU-bDRV-VJQHho not found
 or rejected by a filter.
 stdout: Physical volume "/dev/sdh" successfully created.
 stdout: Volume group "ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e" successfully created
Running command: /usr/sbin/lvcreate --yes -l 100%FREE -n osd-block-5352d594-aa19
-4147-a884-ca2c5775aa1b ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e
 stdout: Logical volume "osd-block-5352d594-aa19-4147-a884-ca2c5775aa1b" created
.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-19
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/ln -s /dev/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-b
lock-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring 
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-19/activate.monmap
 stderr: got monmap epoch 1
Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-19/keyring --create-k
eyring --name osd.19 --add-key AQBY1TBbN8I+HxAAMHGWKLgJugmtzdqllQh5sA==
 stdout: creating /var/lib/ceph/osd/ceph-19/keyring
 stdout: added entity osd.19 auth auth(auid = 18446744073709551615 key=AQBY1TBbN
8I+HxAAMHGWKLgJugmtzdqllQh5sA== with 0 caps)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs
 -i 19 --monmap /var/lib/ceph/osd/ceph-19/activate.monmap --keyfile - --osd-data
 /var/lib/ceph/osd/ceph-19/ --osd-uuid 5352d594-aa19-4147-a884-ca2c5775aa1b --se
tuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sdh
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /de
v/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd-block-5352d594-aa19-4147-a884-ca
2c5775aa1b --path /var/lib/ceph/osd/ceph-19
Running command: /bin/ln -snf /dev/ceph-a2ebf47b-fa4a-43ce-b087-12dbafb5796e/osd
-block-5352d594-aa19-4147-a884-ca2c5775aa1b /var/lib/ceph/osd/ceph-19/block
Running command: /bin/chown -R ceph:ceph /dev/dm-9
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-19
Running command: /bin/systemctl enable ceph-volume@lvm-19-5352d594-aa19-4147-a88
4-ca2c5775aa1b
 stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-v
Running command: /bin/systemctl start ceph-osd@19
--> ceph-volume lvm activate successful for osd ID: 19
--> ceph-volume lvm create successful for: /dev/sdh

verified that osd.19 was in the map with:
# ceph osd tree
ID CLASS WEIGHT  TYPE NAME      STATUS REWEIGHT PRI-AFF 
-1       3.20398 root default                           
-9       0.80099     host n1                            
18   hdd 0.13350         osd.18     up  1.00000 1.00000 
19   hdd 0.13350         osd.19   down        0 1.00000 
20   hdd 0.13350         osd.20     up  1.00000 1.00000 
21   hdd 0.13350         osd.21     up  1.00000 1.00000 
22   hdd 0.13350         osd.22     up  1.00000 1.00000 
23   hdd 0.13350         osd.23     up  1.00000 1.00000 

Only it fails to launch
# systemctl start ceph-osd@19
# systemctl status ceph-osd@19
â ceph-osd@19.service - Ceph object storage daemon osd.19
   Loaded: loaded (/usr/lib/systemd/system/ceph-osd@.service; disabled; vendor preset: disabled)
   Active: activating (auto-restart) (Result: signal) since Mon 2018-06-25 13:44:35 CEST; 3s ago
  Process: 2046453 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 2046447 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 2046453 (code=killed, signal=ABRT)

Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 8: (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x56353eac71f0]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 9: (OSD::_dispatch(Message*)+0xa1) [0x56353eac9d21]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 10: (OSD::ms_dispatch(Message*)+0x56) [0x56353eaca066]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 11: (DispatchQueue::entry()+0xb5a) [0x7f302acce74a]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f302ad6ef2d]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 13: (()+0x7e25) [0x7f30277b0e25]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: 14: (clone()+0x6d) [0x7f30268a1bad]
Jun 25 13:44:35 n1.sprawl.dk ceph-osd[2046453]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Jun 25 13:44:35 n1.sprawl.dk systemd[1]: Unit ceph-osd@19.service entered failed state.
Jun 25 13:44:35 n1.sprawl.dk systemd[1]: ceph-osd@19.service failed.

osd.19 log show:

--- begin dump of recent events ---
     0> 2018-06-25 13:48:47.139 7fc6b91c5700 -1 *** Caught signal (Aborted) **
 in thread 7fc6b91c5700 thread_name:ms_dispatch

 ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic (stable)
 1: (()+0x8e1870) [0x55da2ff6e870]
 2: (()+0xf6d0) [0x7fc6c97ba6d0]
 3: (gsignal()+0x37) [0x7fc6c87db277]
 4: (abort()+0x148) [0x7fc6c87dc968]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x25d) [0x7fc6ccc5a69d]
 6: (()+0x286727) [0x7fc6ccc5a727]
 7: (OSDService::get_map(unsigned int)+0x4a) [0x55da2faa3dda]
 8: (OSD::handle_osd_map(MOSDMap*)+0x1020) [0x55da2fa511f0]
 9: (OSD::_dispatch(Message*)+0xa1) [0x55da2fa53d21]
 10: (OSD::ms_dispatch(Message*)+0x56) [0x55da2fa54066]
 11: (DispatchQueue::entry()+0xb5a) [0x7fc6cccd074a]
 12: (DispatchQueue::DispatchThread::entry()+0xd) [0x7fc6ccd70f2d]
 13: (()+0x7e25) [0x7fc6c97b2e25]
 14: (clone()+0x6d) [0x7fc6c88a3bad]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Any hints would be appreciated, TIA!

/Steffen


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux