Re: Recovery from 12.2.5 (corruption) -> 12.2.6 (hair on fire) -> 13.2.0 (some objects inaccessible and CephFS damaged)

Brad Hubbard <bhubbard@xxxxxxxxxx> · Thu, 19 Jul 2018 11:37:58 +1000

On Thu, Jul 19, 2018 at 2:48 AM, Troy Ablan <tablan@xxxxxxxxx> wrote:
>
>
> On 07/17/2018 11:14 PM, Brad Hubbard wrote:
>>
>> On Wed, Jul 18, 2018 at 2:57 AM, Troy Ablan <tablan@xxxxxxxxx> wrote:
>>>
>>> I was on 12.2.5 for a couple weeks and started randomly seeing
>>> corruption, moved to 12.2.6 via yum update on Sunday, and all hell broke
>>> loose.  I panicked and moved to Mimic, and when that didn't solve the
>>> problem, only then did I start to root around in mailing lists archives.
>>>
>>> It appears I can't downgrade OSDs back to Luminous now that 12.2.7 is
>>> out, but I'm unsure how to proceed now that the damaged cluster is
>>> running under Mimic.  Is there anything I can do to get the cluster back
>>> online and objects readable?
>>
>> That depends on what the specific problem is. Can you provide some
>> data that fills in the blanks around "randomly seeing corruption"?
>>
> Thanks for the reply, Brad.  I have a feeling that almost all of this stems
> from the time the cluster spent running 12.2.6.  When booting VMs that use
> rbd as a backing store, they typically get I/O errors during boot and cannot
> read critical parts of the image.  I also get similar errors if I try to rbd
> export most of the images. Also, CephFS is not started as ceph -s indicates
> damage.
>
> Many of the OSDs have been crashing and restarting as I've tried to rbd
> export good versions of images (from older snapshots).  Here's one
> particular crash:
>
> 2018-07-18 15:52:15.809 7fcbaab77700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/h
> uge/release/13.2.0/rpm/el7/BUILD/ceph-13.2.0/src/os/bluestore/BlueStore.h:
> In function 'void
> BlueStore::SharedBlobSet::remove_last(BlueStore::SharedBlob*)' thread
> 7fcbaab7
> 7700 time 2018-07-18 15:52:15.750916
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.0/rpm/el7/BUILD/ceph-13
> .2.0/src/os/bluestore/BlueStore.h: 455: FAILED assert(sb->nref == 0)
>
>  ceph version 13.2.0 (79a10589f1f80dfe21e8f9794365ed98143071c4) mimic
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0xff) [0x7fcbc197a53f]
>  2: (()+0x286727) [0x7fcbc197a727]
>  3: (BlueStore::SharedBlob::put()+0x1da) [0x5641f39181ca]
>  4: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::SharedBlob>,
> boost::intrusive_ptr<BlueStore::SharedBlob>,
> std::_Identity<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::less<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::allocator<boost::intrusive_ptr<BlueStore::SharedBlob> >
>>::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<B
> lueStore::SharedBlob> >*)+0x2d) [0x5641f3977cfd]
>  5: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::SharedBlob>,
> boost::intrusive_ptr<BlueStore::SharedBlob>,
> std::_Identity<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::less<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::allocator<boost::intrusive_ptr<BlueStore::SharedBlob> >
>>::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<B
> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb]
>  6: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::SharedBlob>,
> boost::intrusive_ptr<BlueStore::SharedBlob>,
> std::_Identity<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::less<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::allocator<boost::intrusive_ptr<BlueStore::SharedBlob> >
>>::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<B
> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb]
>  7: (std::_Rb_tree<boost::intrusive_ptr<BlueStore::SharedBlob>,
> boost::intrusive_ptr<BlueStore::SharedBlob>,
> std::_Identity<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::less<boost::intrusive_ptr<BlueStore::SharedBlob> >,
> std::allocator<boost::intrusive_ptr<BlueStore::SharedBlob> >
>>::_M_erase(std::_Rb_tree_node<boost::intrusive_ptr<B
> lueStore::SharedBlob> >*)+0x1b) [0x5641f3977ceb]
>  8: (BlueStore::TransContext::~TransContext()+0xf7) [0x5641f3979297]
>  9: (BlueStore::_txc_finish(BlueStore::TransContext*)+0x610)
> [0x5641f391c9b0]
>  10: (BlueStore::_txc_state_proc(BlueStore::TransContext*)+0x9a)
> [0x5641f392a38a]
>  11: (BlueStore::_kv_finalize_thread()+0x41e) [0x5641f392b3be]
>  12: (BlueStore::KVFinalizeThread::entry()+0xd) [0x5641f397d85d]
>  13: (()+0x7e25) [0x7fcbbe4d2e25]
>  14: (clone()+0x6d) [0x7fcbbd5c3bad]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>
> Here's the output of ceph -s that might fill in some configuration
> questions.  Since osds are continually restarting if I try to put load on
> it, the cluster seems to be churning a bit.  That's why I set nodown for
> now.
>
>   cluster:
>     id:     b2873c9a-5539-4c76-ac4a-a6c9829bfed2
>     health: HEALTH_ERR
>             1 filesystem is degraded
>             1 filesystem is offline
>             1 mds daemon damaged
>             nodown,noscrub,nodeep-scrub flag(s) set
>             9 scrub errors
>             Reduced data availability: 61 pgs inactive, 56 pgs peering, 4
> pgs stale
>             Possible data damage: 3 pgs inconsistent
>             16 slow requests are blocked > 32 sec
>             26 stuck requests are blocked > 4096 sec
>
>   services:
>     mon: 5 daemons, quorum a,b,c,d,e
>     mgr: a(active), standbys: b, d, e, c
>     mds: lcs-0/1/1 up , 2 up:standby, 1 damaged
>     osd: 34 osds: 34 up, 34 in
>          flags nodown,noscrub,nodeep-scrub
>
>   data:
>     pools:   15 pools, 640 pgs
>     objects: 9.73 M objects, 13 TiB
>     usage:   24 TiB used, 55 TiB / 79 TiB avail
>     pgs:     23.438% pgs not active
>              487 active+clean
>              73  peering
>              70  activating
>              5   stale+peering
>              3   active+clean+inconsistent
>              2   stale+activating
>
>   io:
>     client:   1.3 KiB/s wr, 0 op/s rd, 0 op/s wr
>
>
> If there's any other information I can provide that can help point to the
> problem, I'd be glad to share.

If you leave the cluster to recover what point does it get to (ceph -s output)?

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com