crushmap history

Blair Bethwaite <blair.bethwaite@xxxxxxxxx> · Wed, 10 Apr 2024 23:40:49 +1200

Hi all,

Do the Mons store any crushmap history, and if so how does one get at it
please?

I ask because we've recently encountered an issue in a medium scale (~5PB
raw) EC based RGW focused cluster where "something" happened, which we
still don't know, that suddenly caused us to see 94% of objects (5.4
billion of them) misplaced. We've tracked down the first log message of
that pgmap state change:

Mar 29 10:30:31 mon1 bash\[5804\]: debug 2024-03-29T10:30:31.152+0000
7f3b6e378700  0 log\_channel(cluster) log \[DBG\] : pgmap v44327: 2273 pgs:
225 active+clean, 2038 active+remapped+backfill\_wait, 10
active+remapped+backfilling; 1.6 PiB data, 2.1 PiB used, 2.2 PiB / 4.3 PiB
avail; 5426274136/5752755429 objects misplaced (94.325%); 248 MiB/s, 109
objects/s recovering

This appears to have been preceded (aside from a single HTTP HEAD request
coming into RGW) by a 5 minute gap in logs where either journald couldn't
keep up with debug messages or the Mons were stuck. The last log before
that occurs seems to be a compaction event kicking off:

mon1 bash\[25927\]: Int      0/0    0.00 KB   0.0      0.0     0.0      0.0
      0.0      0.0       0.0   0.0      0.0      0.0      0.00
 0.00         0    0.000       0      0
Mar 29 10:24:14 mon1 bash\[25927\]: \*\* Compaction Stats \[L\] \*\*
Mar 29 10:24:14 mon1 bash\[25927\]: Priority    Files   Size     Score
Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s)
Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
Mar 29 10:24:14 mon1 bash\[25927\]:
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Mar 29 10:24:14 mon1 bash\[25927\]: Low      0/0    0.00 KB   0.0      0.0
    0.0      0.0       0.0      0.0       0.0   0.0    116.0     11.4
 0.02              0.01         7    0.003     490    462
Mar 29 10:24:14 mon1 bash\[25927\]: High      0/0    0.00 KB   0.0      0.0
    0.0      0.0       0.0      0.0       0.0   0.0      0.0      1.9
 1.23              1.20        28    0.044       0      0
Mar 29 10:24:14 mon1 bash\[25927\]: User      0/0    0.00 KB   0.0      0.0
    0.0      0.0       0.0      0.0       0.0   0.0      0.0     16.4
 0.00              0.00         1    0.001       0      0

We're left wondering what the heck has happened to cause such a huge
redistribution of data in the cluster when we've not made any corresponding
changes, so wanting to see if there's any breadcrumbs we can find.

Appreciate any pointers!

-- 
Cheers,
~Blairo
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx