Hi Patrick,
Thanks a lot for letting us know about this issue!
By reading your fix[1] carefully, I understand the heart of this issue is that:
Since Jewel, CephFS introduced a new data structure FSMap (for MultiFS), and the monitor has been using this new structure as the Paxos value,
but Pre-Jewel the one stored in monitor DB was MDSMap, and the initial MDSMap will keep staying in the DB and never get trimmed if CephFS wasn't used at all.
Since from Pacific, the monitor was no longer expecting the MDSMap structure from the DB, which caused the crash.
In order to detect if there is any old MDSMap exists, we just need to get the oldest mdsmap from monitor DB and try to decode it with pacific ceph-dencoder
We can do the below:
1. Stop one monitor (Since this has to be done during upgrade)
1. Export the binary of the first committed mdsmap from monitor DB(ceph-kvstore-tool can do this)
2. Feed the binary to the Pacific version of ceph-dencoder
3. If the binary can be decoded, then we can be sure there is no legacy data structure
Otherwise, there is legacy data structure and need to have a short upgrade stop at the just-released Octopus v15.2.14 before continuing to Pacific.
I've done some testing and it worked, below is the same crash stack when I use pacific ceph-dencoder to decode the mdsmap from a cluster (without cephfs) upgraded from Firefly.
~# ceph-dencoder import mdsmap.1.f2j type FSMap decode dump_json
/build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: In function 'void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&)' thread 7fda1b03a240 time 2021-08-08T04:27:57.491978+0000
/build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: 648: ceph_abort_msg("abort() called")
ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe0) [0x7fda1e3a652d]
2: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xdca) [0x7fda1e9535aa]
3: (DencoderBase<FSMap>::decode[abi:cxx11](ceph::buffer::v15_2_0::list, unsigned long)+0x54) [0x55b3a5e6ed84]
4: main()
5: __libc_start_main()
6: _start()
Aborted (core dumped)
Basically, the above steps have the same workflow regarding to how monitor load the mdsmap from DB and decode it.
[1] https://github.com/ceph/ceph/pull/42349
Best Regards,
Dongdong
Thanks a lot for letting us know about this issue!
By reading your fix[1] carefully, I understand the heart of this issue is that:
Since Jewel, CephFS introduced a new data structure FSMap (for MultiFS), and the monitor has been using this new structure as the Paxos value,
but Pre-Jewel the one stored in monitor DB was MDSMap, and the initial MDSMap will keep staying in the DB and never get trimmed if CephFS wasn't used at all.
Since from Pacific, the monitor was no longer expecting the MDSMap structure from the DB, which caused the crash.
In order to detect if there is any old MDSMap exists, we just need to get the oldest mdsmap from monitor DB and try to decode it with pacific ceph-dencoder
We can do the below:
1. Stop one monitor (Since this has to be done during upgrade)
1. Export the binary of the first committed mdsmap from monitor DB(ceph-kvstore-tool can do this)
2. Feed the binary to the Pacific version of ceph-dencoder
3. If the binary can be decoded, then we can be sure there is no legacy data structure
Otherwise, there is legacy data structure and need to have a short upgrade stop at the just-released Octopus v15.2.14 before continuing to Pacific.
I've done some testing and it worked, below is the same crash stack when I use pacific ceph-dencoder to decode the mdsmap from a cluster (without cephfs) upgraded from Firefly.
~# ceph-dencoder import mdsmap.1.f2j type FSMap decode dump_json
/build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: In function 'void FSMap::decode(ceph::buffer::v15_2_0::list::const_iterator&)' thread 7fda1b03a240 time 2021-08-08T04:27:57.491978+0000
/build/ceph-dJyyVB/ceph-16.2.0/src/mds/FSMap.cc: 648: ceph_abort_msg("abort() called")
ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable)
1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xe0) [0x7fda1e3a652d]
2: (FSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl<true>&)+0xdca) [0x7fda1e9535aa]
3: (DencoderBase<FSMap>::decode[abi:cxx11](ceph::buffer::v15_2_0::list, unsigned long)+0x54) [0x55b3a5e6ed84]
4: main()
5: __libc_start_main()
6: _start()
Aborted (core dumped)
Basically, the above steps have the same workflow regarding to how monitor load the mdsmap from DB and decode it.
[1] https://github.com/ceph/ceph/pull/42349
Best Regards,
Dongdong
Patrick Donnelly <pdonnell@xxxxxxxxxx> 于2021年8月7日周六 上午4:28写道:
Hello Linh,
On Thu, Aug 5, 2021 at 9:12 PM Linh Vu <linh.vu@xxxxxxxxxxxxxxxxx> wrote:
> Without personally knowing the history of a cluster, is there a way to check and see when and which release it began life as? Or check whether such legacy data structures still exist in the mons?
I'm not aware of an easy way to check the release a cluster started
as. And unfortunately, there is no way to check for legacy data
structures. If your cluster has used CephFS at all since Jewel, it's
very unlikely there will be any in the mon stores. If you're not sure,
best to upgrade through v15.2.14 to be safe.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx