Re: After OSD Flap - FAILED assert(oi.version == i->first)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Puzzling, added a question to the ticket.
-Sam

On Thu, Nov 17, 2016 at 4:32 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi Sam,
>
> I've updated the ticket with logs from the wip run.
>
> Nick
>
>> -----Original Message-----
>> From: Samuel Just [mailto:sjust@xxxxxxxxxx]
>> Sent: 15 November 2016 18:30
>> To: Nick Fisk <nick@xxxxxxxxxx>
>> Cc: Ceph Users <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  After OSD Flap - FAILED assert(oi.version == i->first)
>>
>> http://tracker.ceph.com/issues/17916
>>
>> I just pushed a branch wip-17916-jewel based on v10.2.3 with some additional debugging.  Once it builds, would you be able to start
>> the afflicted osds with that version of ceph-osd and
>>
>> debug osd = 20
>> debug ms = 1
>> debug filestore = 20
>>
>> and get me the log?
>> -Sam
>>
>> On Tue, Nov 15, 2016 at 2:06 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> > Hi,
>> >
>> > I have two OSD's which are failing with an assert which looks related
>> > to missing objects. This happened after a large RBD snapshot was
>> > deleted causing several OSD's to start flapping as they experienced
>> > high load. Cluster is fully recovered and I don't need any help from a recovery perspective. I'm happy to Zap and recreate OSD's,
>> which I will probably do in a couple of days time. Or if anybody looks at the error and see's an easy way to get the OSD to start up, then
>> bonus!!!
>> >
>> > However, I thought I would post in case there is any interest in
>> > trying to diagnose why this happened. There was no power or networking issues and no hard reboot's, so this is purely contained
>> within the Ceph OSD process.
>> >
>> > The objects that it claims are missing are from the RBD that had the
>> > snapshot deleted. I'm guessing that the last command before the OSD
>> > died at some point was to delete those two objects which did actually happen, but for some reason the OSD had died before it got
>> confirmation??? And now it's trying to delete them, but they don't exist.
>> >
>> > I have the full debug 20 log, but pretty much all the lines above the
>> > below snippet just have it deleting thousands of objects without any problems.
>> >
>> > Nick
>> >
>> >  -4> 2016-11-15 09:46:52.061643 7f728f9368c0 20 read_log 6 divergent_priors
>> >     -3> 2016-11-15 09:46:52.061779 7f728f9368c0 10 read_log checking for missing items over interval (0'0,1607344'260104]
>> >     -2> 2016-11-15 09:46:52.069987 7f728f9368c0 15 read_log  missing
>> > 1553246'255377,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:head
>> >     -1> 2016-11-15 09:46:52.070007 7f728f9368c0 15 read_log  missing
>> > 1553190'255366,1:96e51ad6:::rbd_data.6fd18238e1f29.00000000002555c5:6c
>> >      0> 2016-11-15 09:46:52.071471 7f728f9368c0 -1 osd/PGLog.cc: In
>> > function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
>> > ghobject_t, const pg_info_t&, std::map<eversion_t, hobject_t>&,
>> > PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&, const
>> > DoutPrefixProvider*, std::set<std::__cxx11::basic_string<char> >*)'
>> > thread 7f728f9368c0 time 2016-11-15 09:46:52.070023
>> > osd/PGLog.cc: 1047: FAILED assert(oi.version == i->first)
>> >
>> >  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> > const*)+0x80) [0x5642d2734ea0]
>> >  2: (PGLog::read_log(ObjectStore*, coll_t, coll_t, ghobject_t,
>> > pg_info_t const&, std::map<eversion_t, hobject_t,
>> > std::less<eversion_t>, std::allocator<std::pair<eversion_t const,
>> > hobject_t> > >&, PGLog::IndexedLog&, pg_missing_t&,
>> > std::__cxx11::basic_ostringstream<char, std::char_traits<char>,
>> > std::allocator<char> >&, DoutPrefixProvider const*,
>> > std::set<std::__cxx11::basic_string<char, std::char_traits<char>,
>> > std::allocator<char> >, std::less<std::__cxx11::basic_string<char,
>> > std::char_traits<char>, std::allocator<char> > >,
>> > std::allocator<std::__cxx11::basic_string<char,
>> > std::char_traits<char>, std::allocator<char> > > >*)+0x719)
>> > [0x5642d22e2fd9]
>> >  3: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x2f6)
>> > [0x5642d21172d6]
>> >  4: (OSD::load_pgs()+0x87d) [0x5642d205345d]
>> >  5: (OSD::init()+0x2026) [0x5642d205e7a6]
>> >  6: (main()+0x2ea5) [0x5642d1fd08f5]
>> >  7: (__libc_start_main()+0xf0) [0x7f728c77c830]
>> >  8: (_start()+0x29) [0x5642d2011f89]
>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux