Re: PSA: upgrading older clusters without CephFS

陶冬冬 <tdd21151186@xxxxxxxxx> · Mon, 9 Aug 2021 13:08:40 +0800

Hi Patrick,

Thanks a lot for letting us know about this issue!

By reading your fix[1] carefully, I understand the heart of this issue is
that:
Since Jewel, CephFS introduced a new data structure FSMap (for MultiFS),
and the monitor has been using this new structure as the Paxos value,
but Pre-Jewel the one stored in monitor DB was MDSMap, and the initial
MDSMap will keep staying in the DB and never get trimmed if CephFS wasn't
used at all.
Since from Pacific, the monitor was no longer expecting the MDSMap
structure from the DB, which caused the crash.

In order to detect if there is any old MDSMap exists, we just need to get
the oldest mdsmap from monitor DB and try to decode it with pacific
ceph-dencoder
We can do the below:
1. Stop one monitor (Since this has to be done during upgrade)
1. Export the binary of the first committed mdsmap from monitor
DB(ceph-kvstore-tool can do this)
2. Feed the binary to the Pacific version of ceph-dencoder
3. If the binary can be decoded, then we can be sure there is no legacy
data structure
    Otherwise, there is legacy data structure and need to have a short
upgrade stop at the just-released Octopus v15.2.14 before continuing to
Pacific.

I've done some testing and it worked, below is the same crash stack when I
use pacific ceph-dencoder to decode the mdsmap from a cluster (without
cephfs) upgraded from Firefly.

~# ceph-dencoder import mdsmap.1.f2j type FSMap decode dump_json
/build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: In function 'void
FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&)' thread
7fda1b03a240 time 2021-08-08T04:27:57.491978+0000
/build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: 648:
ceph_abort_msg("abort() called")
 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific
(stable)
 1: (ceph::__ceph_abort(char const*, int, char const*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0xe0) [0x7fda1e3a652d]
 2: (*FSMap::decode*(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xdca)
[0x7fda1e9535aa]
 3: (DencoderBase<FSMap>::decode[abi:cxx11](ceph::buffer::v15_2_0::list,
unsigned long)+0x54) [0x55b3a5e6ed84]
 4: main()
 5: __libc_start_main()
 6: _start()
Aborted (core dumped)

Basically, the above steps have the same workflow regarding to how monitor
load the mdsmap from DB and decode it.

[1] https://github.com/ceph/ceph/pull/42349

Best Regards,
Dongdong

Patrick Donnelly <pdonnell@xxxxxxxxxx> 于2021年8月7日周六 上午4:28写道：

> Hello Linh,
>
> On Thu, Aug 5, 2021 at 9:12 PM Linh Vu <linh.vu@xxxxxxxxxxxxxxxxx> wrote:
> > Without personally knowing the history of a cluster, is there a way to
> check and see when and which release it began life as? Or check whether
> such legacy data structures still exist in the mons?
>
> I'm not aware of an easy way to check the release a cluster started
> as. And unfortunately, there is no way to check for legacy data
> structures. If your cluster has used CephFS at all since Jewel, it's
> very unlikely there will be any in the mon stores. If you're not sure,
> best to upgrade through v15.2.14 to be safe.
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat Sunnyvale, CA
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx