Re: [ceph-users] PSA: upgrading older clusters without CephFS

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Mon, 16 Aug 2021 14:33:41 -0700

Hi Dongdong,

On Sun, Aug 8, 2021 at 10:08 PM 陶冬冬 <tdd21151186@xxxxxxxxx> wrote:
>
> Hi Patrick,
>
> Thanks a lot for letting us know about this issue!
>
> By reading your fix[1] carefully, I understand the heart of this issue is that:
> Since Jewel, CephFS introduced a new data structure FSMap (for MultiFS), and the monitor has been using this new structure as the Paxos value,
> but Pre-Jewel the one stored in monitor DB was MDSMap, and the initial MDSMap will keep staying in the DB and never get trimmed if CephFS wasn't used at all.
> Since from Pacific, the monitor was no longer expecting the MDSMap structure from the DB, which caused the crash.
>
> In order to detect if there is any old MDSMap exists, we just need to get the oldest mdsmap from monitor DB and try to decode it with pacific ceph-dencoder
> We can do the below:
> 1. Stop one monitor (Since this has to be done during upgrade)
> 1. Export the binary of the first committed mdsmap from monitor DB(ceph-kvstore-tool can do this)
> 2. Feed the binary to the Pacific version of ceph-dencoder
> 3. If the binary can be decoded, then we can be sure there is no legacy data structure
>     Otherwise, there is legacy data structure and need to have a short upgrade stop at the just-released Octopus v15.2.14 before continuing to Pacific.
>
> I've done some testing and it worked, below is the same crash stack when I use pacific ceph-dencoder to decode the mdsmap from a cluster (without cephfs) upgraded from Firefly.
>
> ~# ceph-dencoder import mdsmap.1.f2j type FSMap decode dump_json
> /build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: In function 'void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&)' thread 7fda1b03a240 time 2021-08-08T04:27:57.491978+0000
> /build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: 648: ceph_abort_msg("abort() called")
>  ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
>  1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe0) [0x7fda1e3a652d]
>  2: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xdca) [0x7fda1e9535aa]
>  3: (DencoderBase<FSMap>::decode[abi:cxx11](ceph::buffer::v15_2_0::list, unsigned long)+0x54) [0x55b3a5e6ed84]
>  4: main()
>  5: __libc_start_main()
>  6: _start()
> Aborted (core dumped)
>
> Basically, the above steps have the same workflow regarding to how monitor load the mdsmap from DB and decode it.
>
> [1] https://github.com/ceph/ceph/pull/42349

The steps you outlined look reasonable.

Thanks,

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx