Re: failed to load OSD map for epoch 2898146, got 0 bytes

Eugen Block <eblock@xxxxxx> · Mon, 02 Dec 2024 12:14:33 +0000

Not sure if you chose the best week for that, Cephalocon is happening  
this week. ;-)

Zitat von Frank Schilder <frans@xxxxxx>:

Hi Dan,

I need to bring the affected OSDs up this week. Would be great if  
you could take a look at this case or let me know if you don't have  
time.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: Friday, November 8, 2024 6:13 PM
To: Dan van der Ster
Cc: ceph-users@xxxxxxx
Subject:  Re: failed to load OSD map for epoch 2898146,  
got 0 bytes

Hi Dan,

I have collected a 134M log file (11M compressed) of the startup  
with debug_osd=20/20. Do you have access to the upload area of the  
ceph-devs (the ceph-post-file destination)? If not, any preferred  
way I can send it to you?

To execute the ceph-objectstore-tool mount command it looks like I  
need to activate the OSD first, is this correct? Can I just use  
"umount" to unmount again?

On a side note, I noticed something strange. We have a containerized  
deployment and to have everything at the same version we also run a  
container ceph_adm for admin purposes. We bind-mount /var/lib/ceph  
into all containers. I now observed that for OSDs we manually  
recreated with "ceph-volume batch" inside ceph_adm their data path  
folders remain mounted as in:

tmpfs on /var/lib/ceph/osd/ceph-110 type tmpfs (rw,relatime,seclabel)

ceph-volume batch apparently leaves this tmpfs mount behind instead  
of cleaning it up. For these OSDs we now have the data dir mounted  
in two different containers (its a double mount). Its the first time  
I noticed that, we never had any problems with that. Is it something  
that I need to address somehow (like stop OSD, unmount in  
adm_container or just restart the adm-container) or can I ignore it?

I guess in the future we need to execute an "osd deactivate" or  
"umount" to clean this up after OSD creation.

Thanks for your help!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan.vanderster@xxxxxxxxx>
Sent: Tuesday, October 22, 2024 11:56 PM
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  failed to load OSD map for epoch 2898146,  
got 0 bytes

Hi Frank,

I'm glad this thread is more about understanding what's going on, as
opposed to a quick fix. Normally people in this situation should just
zap and redeploy, like you said.

The next thing I'd do is fuse-mount the OSD and see which osdmaps it
has -- try to read them, etc. Does the OSD have the osdmap epoch it is
warning about in the logs?

ceph-objectstore-tool --data-path </path/to/osd> --op fuse --mountpoint /mnt

Inside /mnt you'll see the PGs and a meta folder, IIRC. Inside meta
you will find the osdmaps.

Cheers, dan

--
Dan van der Ster
CTO @ CLYSO
https://clyso.com | dan.vanderster@xxxxxxxxx

On Tue, Oct 22, 2024 at 12:14 AM Frank Schilder <frans@xxxxxx> wrote:

Hi Dan,

I don't remember exactly when I took them down. Maybe a month ago?  
The reason was a fatal IO error due to connectivity issues. Its 2  
SSDs with 4 OSDs each and they were installed in JBODs. We have 12  
of those and, unfortunately, it seems that 3 have issues with SSDs  
even though they have dedicated slots for SSD/NVMe drives. The  
hallmark are SAS messages containing "blk_update" (usually  
blk_update_request with a drive reset). Any attempts to fix that  
(reseating, moving etc.) failed. We had like 40-50 messages per  
disk per hour in the log, didn't impact performance though. So it  
was a nuisance on the lower priority list.

Some time ago, maybe a month, in one of these JBODs the connection  
finally gave up. For both SSDs at the same time. I'm pretty sure it  
was a bus error and the disks are fine. I stopped these OSDs and  
the data is recovered, no problems here. The disks are part of our  
FS meta data pool and we have plenty, so there was no rush. They  
store a ton of objects though and I try to avoid having a full  
re-write of everything (life-time fetishist).

Yesterday, we started a major disk replacement operation after  
evacuating about 40 HDDs. As part of this, we move all SSDs from  
the JBODs to the servers and I tried to get these two disks up  
yesterday with the result reported below. We are not done yet with  
the maintenance operation and I can pull logs after we are done.  
Possibly next week. We are not in a rush to get these disks back up  
and I'm also prepared to just zap and redeploy these.

My interest with this case is along the lines "I would like to know  
what is going on" and "is there a better way than zap+redeploy".  
Others might be in a situation where they don't have the luxury of  
all data being healthy and we have a chance to experiment without  
any risk.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan.vanderster@xxxxxxxxx>
Sent: Tuesday, October 22, 2024 12:04 AM
To: Frank Schilder
Cc: ceph-users@xxxxxxx
Subject: Re:  failed to load OSD map for epoch 2898146,  
got 0 bytes

Hi Frank,

Do you have some more info about these OSDs -- how long were they down
for? Were they down because of some IO errors?

Is it possible that the OSD thinks it stored those osdmaps but IO
errors are preventing them from being loaded?

I know the log is large, but can you share at least a snippet of when
this starts? Preferably with debug_osd = 10.

Thanks, Dan

--
Dan van der Ster
CTO @ CLYSO
https://clyso.com | dan.vanderster@xxxxxxxxx

On Mon, Oct 21, 2024 at 1:32 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Dan,
>
> maybe not. Looking at the output of
>
> grep -B 1 -e "2971464 failed to load OSD map for epoch 2898132"  
/var/log/ceph/ceph-osd.1004.log
>
> that searches for lines that start a cycle and also print the  
line before, there might be some progress, but I'm not sure:
>
> 2024-10-21T17:41:40.173+0200 7fad509a1700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898208, got 0 bytes
> 2024-10-21T17:41:40.173+0200 7fad4a194700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> --
> 2024-10-21T17:41:40.610+0200 7fad519a3700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898678, got 0 bytes
> 2024-10-21T17:41:40.610+0200 7fad4e19c700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> --
> 2024-10-21T17:41:41.340+0200 7fad4c198700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898293, got 0 bytes
> 2024-10-21T17:41:41.340+0200 7fad4f19e700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> --
> 2024-10-21T17:41:41.347+0200 7fad4e99d700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2899238, got 0 bytes
> 2024-10-21T17:41:41.347+0200 7fad4c999700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
>
> The loop seems to run longer and longer. Problem is though that  
by the time it got here it already wrote like 1G log. The loop  
seems to repeat over all epochs for each such iteration, so the log  
output is quadratic in the number of epochs to catch up with. I  
still have like 100000 to go and I doubt I have disks large enough  
to collect the resulting logs. The logging probably also a total  
performance killer.
>
> Is it possible to suppress the massive log spam so that I can let  
it run until it is marked up? These messages seem not to be related  
to a log level. If absolutely necessary, I could start the OSD  
manually with logging to disk disabled.
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dan.vanderster@xxxxxxxxx>
> Sent: Monday, October 21, 2024 9:03 PM
> To: Frank Schilder
> Cc: ceph-users@xxxxxxx
> Subject: Re:  failed to load OSD map for epoch  
2898146, got 0 bytes
>
> Hi Frank,
>
> Are you sure it's looping over the same epochs?
> It looks like that old osd is trying to catch up on all the osdmaps it
> missed while it was down. (And those old maps are probably trimmed
> from all the mons and osds, based on the "got 0 bytes" error).
> Eventually it should catch up to the current (e 2971464 according to
> your log), and then the PGs can go active.
>
> Cheers, Dan
>
> --
> Dan van der Ster
> CTO @ CLYSO
> https://clyso.com | dan.vanderster@xxxxxxxxx
>
>
>
>
> On Mon, Oct 21, 2024 at 9:13 AM Frank Schilder <frans@xxxxxx> wrote:
> >
> > Hi all,
> >
> > I have a strange problem on an octopus latest cluster.  We had  
a couple of SSD OSDs down for a while and brought them up today  
again. For some reason, these OSDs don't come up and flood the log  
with messages like
> >
> > osd.1004 2971464 failed to load OSD map for epoch 2898146, got 0 bytes
> >
> > These messages cycle through the same epochs over and over  
again. I did not really fine too much help, there is an old thread  
about a similar/the same problem on a home lab cluster, with new  
OSDs though, I believe. I couldn't really find useful information.  
The OSDs seem to boot fine and then end up in something like a  
death loop. Below some snippets from the OSD log.
> >
> > Any hints appreciated.
> > Thanks and best regards,
> > Frank
> >
> > After OSD start, everything looks normal up to here:
> >
> > 2024-10-21T17:41:39.136+0200 7fad73cf6f00  0 osd.1004 2971464  
load_pgs opened 205 pgs
> > 2024-10-21T17:41:39.140+0200 7fad73cf6f00 -1 osd.1004 2971464  
log_to_monitors {default=true}
> > 2024-10-21T17:41:39.150+0200 7fad73cf6f00 -1 osd.1004 2971464  
mon_cmd_maybe_osd_create fail: 'osd.1004 has already bound to class  
'fs_meta', can not reset class to 'ssd'; use 'ceph osd crush
> >  rm-device-class <id>' to remove old class first': (16) Device  
or resource busy
> > 2024-10-21T17:41:39.155+0200 7fad519a3700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898133, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898134, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898135, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad511a2700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898136, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4f99f700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4f99f700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898133, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898132, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898133, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898134, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898135, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898136, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898137, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad73cf6f00  0 osd.1004 2971464  
done with init, starting boot process
> > 2024-10-21T17:41:39.155+0200 7fad73cf6f00  1 osd.1004 2971464 start_boot
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898138, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898139, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898140, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898141, got 0 bytes
> > 2024-10-21T17:41:39.155+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898142, got 0 bytes
> > 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898143, got 0 bytes
> > 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898144, got 0 bytes
> > 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898145, got 0 bytes
> > 2024-10-21T17:41:39.156+0200 7fad4b196700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2898146, got 0 bytes
> >
> >
> > These messages repeat over and over again with some others of  
this form showing up every now and then:
> >
> > 2024-10-21T17:41:39.476+0200 7fad651ca700  4 rocksdb:  
[db/compaction_job.cc:1332] [default] [JOB 12] Generated table  
#82879: 76571 keys, 67866714 bytes
> > 2024-10-21T17:41:39.688+0200 7fad651ca700  4 rocksdb:  
EVENT_LOG_v1 {"time_micros": 1729525299690000, "cf_name":  
"default", "job": 12, "event": "table_file_creation",  
"file_number": 82879, "file_size": 67866714, "table_properties":  
{"data_size": 67111697, "index_size": 562601, "filter_size":  
191557, "raw_key_size": 4823973, "raw_average_key_size": 63,  
"raw_value_size": 62631087, "raw_average_value_size": 817,  
"num_data_blocks": 15644, "num_entries": 76571,  
"filter_policy_name": "rocksdb.BuiltinBloomFilter"}}
> >
> >
> > And another occasion:
> >
> > 2024-10-21T17:41:40.520+0200 7fad651ca700  4 rocksdb:  
[db/compaction_job.cc:1332] [default] [JOB 12] Generated table  
#82880: 76774 keys, 67868330 bytes
> > 2024-10-21T17:41:40.520+0200 7fad501a0700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2899234, got 0 bytes
> > 2024-10-21T17:41:40.520+0200 7fad501a0700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2899235, got 0 bytes
> > 2024-10-21T17:41:40.520+0200 7fad501a0700 -1 osd.1004 2971464  
failed to load OSD map for epoch 2899236, got 0 bytes
> > 2024-10-21T17:41:40.520+0200 7fad651ca700  4 rocksdb:  
EVENT_LOG_v1 {"time_micros": 1729525300521403, "cf_name":  
"default", "job": 12, "event": "table_file_creation",  
"file_number": 82880, "file_size": 67868330, "table_properties":  
{"data_size": 67113021, "index_size": 562509, "filter_size":  
191941, "raw_key_size": 4836742, "raw_average_key_size": 62,  
"raw_value_size": 62623274, "raw_average_value_size": 815,  
"num_data_blocks": 15630, "num_entries": 76774,  
"filter_policy_name": "rocksdb.BuiltinBloomFilter"}}
> >
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx