Re: v13.2.7 osds crash in build_incremental_map_msg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My advice is to wait.

We built a 13.2.7 + https://github.com/ceph/ceph/pull/26448 cherry
picked and the OSDs no longer crash.

My vote would be for a quick 13.2.8.

-- Dan

On Wed, Dec 4, 2019 at 2:41 PM Frank Schilder <frans@xxxxxx> wrote:
>
> Is this issue now a no-go for updating to 13.2.7 or are there only some specific unsafe scenarios?
>
> Best regards,
>
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Dan van der Ster <dan@xxxxxxxxxxxxxx>
> Sent: 03 December 2019 16:42:45
> To: ceph-users
> Subject: Re:  v13.2.7 osds crash in build_incremental_map_msg
>
> I created https://tracker.ceph.com/issues/43106 and we're downgrading
> our osds back to 13.2.6.
>
> -- dan
>
> On Tue, Dec 3, 2019 at 4:09 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > Hi all,
> >
> > We're midway through an update from 13.2.6 to 13.2.7 and started
> > getting OSDs crashing regularly like this [1].
> > Does anyone obviously know what the issue is? (Maybe
> > https://github.com/ceph/ceph/pull/26448/files ?)
> > Or is it some temporary problem while we still have v13.2.6 and
> > v13.2.7 osds running concurrently?
> >
> > Thanks!
> >
> > Dan
> >
> > [1]
> >
> > 2019-12-03 15:53:51.817 7ff3a3d39700 -1 osd.1384 2758889
> > build_incremental_map_msg missing incremental map 2758889
> > 2019-12-03 15:53:51.817 7ff3a453a700 -1 osd.1384 2758889
> > build_incremental_map_msg missing incremental map 2758889
> > 2019-12-03 15:53:51.817 7ff3a453a700 -1 osd.1384 2758889
> > build_incremental_map_msg unable to load latest map 2758889
> > 2019-12-03 15:53:51.822 7ff3a453a700 -1 *** Caught signal (Aborted) **
> >  in thread 7ff3a453a700 thread_name:tp_osd_tp
> >
> >  ceph version 13.2.7 (71bd687b6e8b9424dd5e5974ed542595d8977416) mimic (stable)
> >  1: (()+0xf5f0) [0x7ff3c620b5f0]
> >  2: (gsignal()+0x37) [0x7ff3c522b337]
> >  3: (abort()+0x148) [0x7ff3c522ca28]
> >  4: (OSDService::build_incremental_map_msg(unsigned int, unsigned int,
> > OSDSuperblock&)+0x767) [0x555d60e8d797]
> >  5: (OSDService::send_incremental_map(unsigned int, Connection*,
> > std::shared_ptr<OSDMap const>&)+0x39e) [0x555d60e8dbee]
> >  6: (OSDService::share_map_peer(int, Connection*,
> > std::shared_ptr<OSDMap const>)+0x159) [0x555d60e8eda9]
> >  7: (OSDService::send_message_osd_cluster(int, Message*, unsigned
> > int)+0x1a5) [0x555d60e8f085]
> >  8: (ReplicatedBackend::issue_op(hobject_t const&, eversion_t const&,
> > unsigned long, osd_reqid_t, eversion_t, eversion_t, hobject_t,
> > hobject_t, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t>
> > > const&, boost::optional<pg_hit_set_history_t>&,
> > ReplicatedBackend::InProgressOp*, ObjectStore::Transaction&)+0x452)
> > [0x555d6116e522]
> >  9: (ReplicatedBackend::submit_transaction(hobject_t const&,
> > object_stat_sum_t const&, eversion_t const&,
> > std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&,
> > eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t,
> > std::allocator<pg_log_entry_t> > const&,
> > boost::optional<pg_hit_set_history_t>&, Context*, unsigned long,
> > osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x6f5) [0x555d6117ed85]
> >  10: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
> > PrimaryLogPG::OpContext*)+0xd62) [0x555d60ff5142]
> >  11: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xf12)
> > [0x555d61035902]
> >  12: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x3679)
> > [0x555d610397a9]
> >  13: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
> > ThreadPool::TPHandle&)+0xc99) [0x555d6103d869]
> >  14: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1b7)
> > [0x555d60e8e8a7]
> >  15: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
> > ThreadPool::TPHandle&)+0x62) [0x555d611144c2]
> >  16: (OSD::ShardedOpWQ::_process(unsigned int,
> > ceph::heartbeat_handle_d*)+0x592) [0x555d60eb25f2]
> >  17: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x3d3)
> > [0x7ff3c929f5b3]
> >  18: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7ff3c92a01a0]
> >  19: (()+0x7e65) [0x7ff3c6203e65]
> >  20: (clone()+0x6d) [0x7ff3c52f388d]
> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> > needed to interpret this.
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux