Re: Simultaneous CEPH OSD crashes

Samuel Just <sjust@xxxxxxxxxx> · Tue, 29 Sep 2015 10:06:52 -0700



It's an EIO.  The osd got an EIO from the underlying fs.  That's what
causes those asserts.  You probably want to redirect to the relevant
fs maling list.
-Sam

On Tue, Sep 29, 2015 at 7:42 AM, Lionel Bouton
<lionel-subscription@xxxxxxxxxxx> wrote:
> Le 27/09/2015 10:25, Lionel Bouton a écrit :
>> Le 27/09/2015 09:15, Lionel Bouton a écrit :
>>> Hi,
>>>
>>> we just had a quasi simultaneous crash on two different OSD which
>>> blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9.
>>>
>>> the first OSD to go down had this error :
>>>
>>> 2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27
>>> 06:30:33.145251
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> the second OSD crash was similar :
>>>
>>> 2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27
>>> 06:30:57.260978
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> I'm familiar with this error : it happened already with a BTRFS read
>>> error (invalid csum) and I could correct it after flush-journal/deleting
>>> the corrupted file/starting OSD/pg repair.
>>> This time though there isn't any kernel log indicating an invalid csum.
>>> The kernel is different though : we use 3.18.9 on these two servers and
>>> the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors
>>> with this version. I've launched btrfs scrub on the 2 filesystems just
>>> in case (still waiting for completion).
>>>
>>> The first attempt to restart these OSDs failed: one OSD died 19 seconds
>>> after start, the other 21 seconds. Seeing that, I temporarily brought
>>> down the min_size to 1 which allowed the 9 incomplete PG to recover. I
>>> verified this by bringing min_size again to 2 and then restarted the 2
>>> OSDs. They didn't crash yet.
>>>
>>> For reference the assert failures were still the same when the OSD died
>>> shortly after start :
>>> 2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27
>>> 08:20:19.325126
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> 2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function
>>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>>> size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27
>>> 08:20:50.605234
>>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> || got != -5)
>>>
>>> Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving
>>> one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with
>>> currently a 10 minute interval), this might be relevant (or just a
>>> coincidence).
>>>
>>> I made copies of the ceph osd logs (including the stack trace and the
>>> recent events) if needed.
>>>
>>> Can anyone put some light on why these OSDs died ?
>> I just had a thought. Could launching a defragmentation on a file in a
>> BTRFS OSD filestore trigger this problem?
>
> That's not it : we had another crash a couple of hours ago on one of the
> two servers involved in the first crashes and there was no concurrent
> defragmentation going on.
>
> 2015-09-29 14:18:53.479881 7f8d78ff9700 -1 os/FileStore.cc: In function
> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
> size_t, ceph::bufferlist&, bool)' thread 7f8d78ff9700 time 2015-09-29
> 14:18:53.425790
> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
> || got != -5)
>
>  ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
>  1: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned
> long, ceph::buffer::list&, bool)+0x96a) [0x8917ea]
>  2: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned
> long, unsigned long, ceph::buffer::list*)+0x81) [0x90ecc1]
>  3: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
> std::vector<OSDOp, std::allocator<OSDOp> >&)+0x6a81) [0x801091]
>  4: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x63)
> [0x809f23]
>  5: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xb6f) [0x80adbf]
>  6: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x2ced) [0x815f4d]
>  7: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>,
> ThreadPool::TPHandle&)+0x70c) [0x7b047c]
>  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x34a) [0x60c74a]
>  9: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>,
> ThreadPool::TPHandle&)+0x628) [0x628808]
>  10: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>,
> std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG>
>>::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x66ea8c]
>  11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa60416]
>  12: (ThreadPool::WorkThread::entry()+0x10) [0xa62430]
>  13: (()+0x8217) [0x7f8dae984217]
>  14: (clone()+0x6d) [0x7f8dad129f8d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> For the 2 previous crashes, I launched btrfs scrubs and it couldn't find
> any problem. Could someone help diagnose what is going on? Is it a known
> bug?
>
> Best regards,
>
> Lionel
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com