Re: OSD failed to load OSD map for epoch

Johan Hattne <johan@xxxxxxxxx> · Wed, 28 Jul 2021 11:11:36 -0700

OK, thanks!  This is the same package as in the Octopus images, so I 
would expect Pacific to fail just as spectacularly.

What's the best way to have this fixed?  New issue on the Ceph tracker? 
 I understand the Ceph images use CentOS packages, so should they be 
poked as well?

// Best wishes; Johan

On 2021-07-27 23:48, Eugen Block wrote:
Alright, it's great that you could fix it!

In my one-node test cluster (Pacific) I see this smartctl version:

[ceph: root@pacific /]# rpm -q smartmontools
smartmontools-7.1-1.el8.x86_64

Zitat von Johan Hattne <johan@xxxxxxxxx>:

Thanks a lot, Eugen!  I had not found those threads, but I did 
eventually recover; details below.  And yes, this is a toy size-2 
cluster with two OSDs, but I suspect I would seen the same problem on 
a more reasonable setup since this whole mess was caused by Octopus's 
smartmontools not playing nice with the NVMes.

Just as in the previous thread Eugen provided, I got an OSD map from 
the monitors:

  # ceph osd getmap 4372 > /tmp/osd_map_4372

copied it to the OSD hosts and imported it:

  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool 
--data-path /var/lib/ceph/osd/ceph-0/ --op set-osdmap --file 
/tmp/osd_map_4372

Given the initial cause of the error, I removed the WAL devices:

  # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source 
/var/lib/ceph/osd/ceph-0/block.wal --dev-target 
/var/lib/ceph/osd/ceph-0/block --command bluefs-bdev-migrate
  # ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block.wal

Here I got bitten by what looks like #49554, so

  # lvchange --deltag "ceph.wal_device=/dev/ceph-wal/wal-0" --deltag 
"ceph.wal_uuid=G7Z5qA-OaJQ-Spvs-X4ec-0SvX-vT2C-C0Dbpe" 
/dev/ceph-block-0/block-0

And analogously for osd1.  After restarting the OSDs, deep scrubbing, 
and a bit of manual repair, the cluster is healthy again.

The reason for the crash turns out to be a known problem with 
smartmontools <7.2 and the Micron 2200 NVMes that were used to back 
the WAL (https://www.smartmontools.org/ticket/1404).  Unfortunately, 
the Octopus image ships with smartmontools 7.1, which will crash the 
kernel on e.g. "smartctl -a /dev/nvme0".  Before switching to Octopus 
containers, I was using smartmontools from Debian backports, which 
does not have this problem.

Does Pacific have newer smartmontools?

// Best wishes; Johan

On 2021-07-27 06:35, Eugen Block wrote:
Hi,

did you read this thread [1] reporting a similar issue? It refers to 
a solution described in [2] but the OP in [1] recreated all OSDs, so 
it's not clear what the root cause was.
Can you start the OSD with more verbose (debug) output and share 
that? Does your cluster really have only two OSDs? Are you running it 
with size 2 pools?

[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/EUFDKK3HEA5DPTUVJ5LBNQSWAKZH5ZM7/ 
[2] 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html 

Zitat von Johan Hattne <johan@xxxxxxxxx>:

Dear all;

We have 3-node cluster that has two OSDs on separate nodes, each 
with wal on NVMe.  It's been running fine for quite some time, 
albeit under very light load.  This week, we moved from 
package-based Octopus to container-based ditto (15.2.13, all on 
Debian stable).  Within a few hours of that change, both OSDs 
crashed and dmesg filled up with stuff like:

  DMAR: DRHD: handling fault status reg 2
  DMAR: [DMA Read] Request device [06:00.0] PASID ffffffff fault 
addr ffbc0000 [fault reason 06] PTE Read access is not set

where 06:00.0 is the NVMe with the wal.  This happened at the same 
time on *both* OSD nodes, but I'll worry about why this happened 
later.  I would first like to see if I can get the cluster back up.

From cephadm shell, I activate OSD 1 and try to start it (I did 
create a minimal /etc/ceph/ceph.conf with global "fsid" and "mon 
host" for that purpose):

  # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 
--no-systemd
  # ceph-osd -d --cluster ceph --id 1

This gives "osd.1 0 OSD::init() : unable to read osd superblock", 
and the subsequent output indicates that this due to checksum 
errors.  So ignore checksum mismatches and try again:

  # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster 
ceph --id 1

which results in "osd.1 0 failed to load OSD map for epoch 4372, got 
0 bytes".  The monitors are at 4378, as per:

  # ceph osd stat
  2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378

Is there any way to get past this?  For instance, could I coax the 
OSDs into epoch 4378?  This is the first time I deal a ceph 
disaster, so there may be all kinds of obvious things I'm missing.

// Best wishes; Johan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx