Hi,
I'm not sure if and how that could help, there's a get-crushmap
command for the ceph-monstore-tool:
[ceph: root@host1 /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-host1/
show-versions -- --map-type crushmap > show-versions
[ceph: root@host1 /]# cat show-versions
first committed: 0
last committed: 0
[ceph: root@host1 /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-host1/
get-crushmap --version 0 > crushmap-version-0
[ceph: root@host1 /]# cat crushmap-version-0
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
I don't have the option to shut down a MON in production right now to
compare if there are more committed versions or something. And
obviously, the result is not what I would usually expect from a
crushmap. I also injected a modified monmap to provoke a new version:
# ceph osd setcrushmap -i 20240417-crushmap.new
363
But the result doesn't really change, so I'm not sure how that can help:
[ceph: root@host1 /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-host1/
get-crushmap --version 363
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
It seems that all the commands print the same output:
[ceph: root@host1 /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-host1/
get-crushmap --version 5885
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
[ceph: root@host1 /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-host1/
get-osdmap --version 5885
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
[ceph: root@host1 /]# ceph-monstore-tool /var/lib/ceph/mon/ceph-host1/
get-monmap --version 5885
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
Maybe one of the devs can shed some light if there's a way.
Regards,
Eugen
Zitat von Blair Bethwaite <blair.bethwaite@xxxxxxxxx>:
Hi all,
Do the Mons store any crushmap history, and if so how does one get at it
please?
I ask because we've recently encountered an issue in a medium scale (~5PB
raw) EC based RGW focused cluster where "something" happened, which we
still don't know, that suddenly caused us to see 94% of objects (5.4
billion of them) misplaced. We've tracked down the first log message of
that pgmap state change:
Mar 29 10:30:31 mon1 bash\[5804\]: debug 2024-03-29T10:30:31.152+0000
7f3b6e378700 0 log\_channel(cluster) log \[DBG\] : pgmap v44327: 2273 pgs:
225 active+clean, 2038 active+remapped+backfill\_wait, 10
active+remapped+backfilling; 1.6 PiB data, 2.1 PiB used, 2.2 PiB / 4.3 PiB
avail; 5426274136/5752755429 objects misplaced (94.325%); 248 MiB/s, 109
objects/s recovering
This appears to have been preceded (aside from a single HTTP HEAD request
coming into RGW) by a 5 minute gap in logs where either journald couldn't
keep up with debug messages or the Mons were stuck. The last log before
that occurs seems to be a compaction event kicking off:
mon1 bash\[25927\]: Int 0/0 0.00 KB 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.00
0.00 0 0.000 0 0
Mar 29 10:24:14 mon1 bash\[25927\]: \*\* Compaction Stats \[L\] \*\*
Mar 29 10:24:14 mon1 bash\[25927\]: Priority Files Size Score
Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s)
Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
Mar 29 10:24:14 mon1 bash\[25927\]:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Mar 29 10:24:14 mon1 bash\[25927\]: Low 0/0 0.00 KB 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 116.0 11.4
0.02 0.01 7 0.003 490 462
Mar 29 10:24:14 mon1 bash\[25927\]: High 0/0 0.00 KB 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.9
1.23 1.20 28 0.044 0 0
Mar 29 10:24:14 mon1 bash\[25927\]: User 0/0 0.00 KB 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 16.4
0.00 0.00 1 0.001 0 0
We're left wondering what the heck has happened to cause such a huge
redistribution of data in the cluster when we've not made any corresponding
changes, so wanting to see if there's any breadcrumbs we can find.
Appreciate any pointers!
--
Cheers,
~Blairo
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx