“podman logs ceph-xxxxxxx-osd-xxx” may contains additional logs. > 在 2021年3月19日,04:29,Philip Brown <pbrown@xxxxxxxxxx> 写道: > > I've been banging on my ceph octopus test cluster for a few days now. > 8 nodes. each node has 2 SSDs and 8 HDDs. > They were all autoprovisioned so that each HDD gets an LVM slice of an SSD as a db partition. > > service_type: osd > service_id: osd_spec_default > placement: > host_pattern: '*' > data_devices: > rotational: 1 > db_devices: > rotational: 0 > > > things were going pretty good, until... yesterday.. i noticed TWO of the OSDs were "down". > > I went to check the logs, with > journalctl -u ceph-xxxx@xxxxxxx > > all it showed were a bunch of generic debug info, and the fact that it stopped. > and various automatic attempts to restart. > but no indication of what was wrong, and why the restarts KEEP failing. > > > sample output: > > > systemd[1]: Stopped Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00. > systemd[1]: Starting Ceph osd.33 for e51eb2fa-7f82-11eb-94d5-78e3b5148f00... > bash[9340]: ceph-e51eb2fa-7f82-11eb-94d5-78e3b5148f00-osd.33-activate > bash[9340]: WARNING: The same type, major and minor should not be used for multiple devices. > bash[9340]: WARNING: The same type, major and minor should not be used for multiple devices. > podman[9369]: 2021-03-07 16:00:15.543010794 -0800 PST m=+0.318475882 container create > podman[9369]: 2021-03-07 16:00:15.73461926 -0800 PST m=+0.510084288 container init > ..... > bash[1611473]: --> ceph-volume lvm activate successful for osd ID: 33 > podman[1611501]: 2021-03-18 10:23:02.564242824 -0700 PDT m=+1.379793448 container died > bash[1611473]: ceph-xx-xx-xx-xx-osd.33 > bash[1611473]: WARNING: The same type, major and minor should not be used for multiple devices. > (repeat, repeat...) > podman[1611615]: 2021-03-18 10:23:03.530992487 -0700 PDT m=+0.333130660 container create > > .... > systemd[1]: Started Ceph osd.33 for xx-xx-xx-xx > systemd[1]: ceph-xx-xx-xx-xx@osd.33.service: main process exited, code=exited, status=1/FAILURE > bash[1611797]: ceph-xx-xx-xx-xx-osd.33-deactivate > > and eventually it just gives up. > > smartctl -a doesnt show any errors on the HDD > > > dmesg doesnt show anything. > > So... what do I do? > > > > > > -- > Philip Brown| Sr. Linux System Administrator | Medata, Inc. > 5 Peters Canyon Rd Suite 250 > Irvine CA 92606 > Office 714.918.1310| Fax 714.918.1325 > pbrown@xxxxxxxxxx| https://apac01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.medata.com%2F&data=04%7C01%7C%7C739f028cfcc04020c94c08d8ea4c9673%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637516961950804014%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TRkxSSU8BhLWM7cNpyJ8lX6J7U6Fdfi7ubrkFt91DkU%3D&reserved=0 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx