OSDs will not start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey all,

I haven't managed to solve this issue yet.
To simplify things, I'm looking to restart one OSD which crashes shortly
after starting.
As mentioned before I've ruled out this being related to hardware.

Not a dev but looking at the log my error occurs at this
<https://github.com/ceph/ceph/blob/quincy/src/os/bluestore/BlueFS.cc#L1419>
point
in the code. (
https://github.com/ceph/ceph/blob/quincy/src/os/bluestore/BlueFS.cc#L1419)
Any suggestions on a way forward would be greatly appreciated.

Tried the *ceph-bluestore-tool* to repair / fsck / etc. but all fail with
the BlueFS reply error.
If I could use the *ceph-objectstore-tool* to export the shard of the PG
that's down I'd try that but it also fails with the same BlueFS replay
error.

I've added the output of the osd journalctl and the osd log below in case
it's helpful to identify anything obvious.
Set debug bluefs = 20 , saw this in another post.

https://pastebin.com/3PkCabdf
https://pastebin.com/BT9bnhSb


Kind regards
Geoffrey Rhodes


On Wed, 25 Jan 2023 at 12:44, Geoffrey Rhodes <geoffrey@xxxxxxxxxxxxx>
wrote:

> Good day all,
>
> I've an issue with a few OSDs (in two different nodes) that attempt to
> start but fail / crash quite quickly. They are all LVM disks.
> I've tried upgrading software, health checks on the hardware (nodes and
> disks) and there doesn't seem to be any issues there.
>
> Recently I've had a few "other" disks physically fail in the cluster and
> now have one PG down which is blocking some IO on CephFS.
> I've added the output of the osd journalctl and the osd log below in case
> it's helpful to identify anything obvious.
> I also set debug bluefs = 20 , saw this in another post.
> I recently manually upgraded this node to (17.2.0) before the problem
> began, later to (17.2.5). - The other osds in this node start / run fine.
>
> The other node (15.2.17) also has a few osds that will not start and some
> that run without issue.
> Could anyone point me in the right direction to investigate and solve my
> osd issues.
>
> https://pastebin.com/3PkCabdf
> https://pastebin.com/BT9bnhSb
>
> Production system mainly used for CephFS
> OS: Ubuntu 20.04.5 LTS
> Ceph versions: 15.2.17 - Octopus (one OSD node manually upgraded to
> 17.2.5 - Quincy)
> Erasure data pool (K=4, M=2)  - The journal's for each osd are co-located
> on each drive
>
> Kind regards
> Geoffrey Rhodes
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux