Re: Simultaneous CEPH OSD crashes

Lionel Bouton <lionel-subscription@xxxxxxxxxxx> · Tue, 29 Sep 2015 16:42:57 +0200

Le 27/09/2015 10:25, Lionel Bouton a écrit :
> Le 27/09/2015 09:15, Lionel Bouton a écrit :
>> Hi,
>>
>> we just had a quasi simultaneous crash on two different OSD which
>> blocked our VMs (min_size = 2, size = 3) on Firefly 0.80.9.
>>
>> the first OSD to go down had this error :
>>
>> 2015-09-27 06:30:33.257133 7f7ac7fef700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f7ac7fef700 time 2015-09-27
>> 06:30:33.145251
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> the second OSD crash was similar :
>>
>> 2015-09-27 06:30:57.373841 7f05d92cf700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f05d92cf700 time 2015-09-27
>> 06:30:57.260978
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> I'm familiar with this error : it happened already with a BTRFS read
>> error (invalid csum) and I could correct it after flush-journal/deleting
>> the corrupted file/starting OSD/pg repair.
>> This time though there isn't any kernel log indicating an invalid csum.
>> The kernel is different though : we use 3.18.9 on these two servers and
>> the others had 4.0.5 so maybe BTRFS doesn't log invalid checksum errors
>> with this version. I've launched btrfs scrub on the 2 filesystems just
>> in case (still waiting for completion).
>>
>> The first attempt to restart these OSDs failed: one OSD died 19 seconds
>> after start, the other 21 seconds. Seeing that, I temporarily brought
>> down the min_size to 1 which allowed the 9 incomplete PG to recover. I
>> verified this by bringing min_size again to 2 and then restarted the 2
>> OSDs. They didn't crash yet.
>>
>> For reference the assert failures were still the same when the OSD died
>> shortly after start :
>> 2015-09-27 08:20:19.332835 7f4467bd0700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f4467bd0700 time 2015-09-27
>> 08:20:19.325126
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> 2015-09-27 08:20:50.626344 7f97f2d95700 -1 os/FileStore.cc: In function
>> 'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
>> size_t, ceph::bufferlist&, bool)' thread 7f97f2d95700 time 2015-09-27
>> 08:20:50.605234
>> os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
>> || got != -5)
>>
>> Note that at 2015-09-27 06:30:11 a deep-scrub started on a PG involving
>> one (and only one) of these 2 OSD. As we evenly space deep-scrubs (with
>> currently a 10 minute interval), this might be relevant (or just a
>> coincidence).
>>
>> I made copies of the ceph osd logs (including the stack trace and the
>> recent events) if needed.
>>
>> Can anyone put some light on why these OSDs died ?
> I just had a thought. Could launching a defragmentation on a file in a
> BTRFS OSD filestore trigger this problem?

That's not it : we had another crash a couple of hours ago on one of the
two servers involved in the first crashes and there was no concurrent
defragmentation going on.

2015-09-29 14:18:53.479881 7f8d78ff9700 -1 os/FileStore.cc: In function
'virtual int FileStore::read(coll_t, const ghobject_t&, uint64_t,
size_t, ceph::bufferlist&, bool)' thread 7f8d78ff9700 time 2015-09-29
14:18:53.425790
os/FileStore.cc: 2641: FAILED assert(allow_eio || !m_filestore_fail_eio
|| got != -5)

 ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)
 1: (FileStore::read(coll_t, ghobject_t const&, unsigned long, unsigned
long, ceph::buffer::list&, bool)+0x96a) [0x8917ea]
 2: (ReplicatedBackend::objects_read_sync(hobject_t const&, unsigned
long, unsigned long, ceph::buffer::list*)+0x81) [0x90ecc1]
 3: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
std::vector<OSDOp, std::allocator<OSDOp> >&)+0x6a81) [0x801091]
 4: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x63)
[0x809f23]
 5: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0xb6f) [0x80adbf]
 6: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x2ced) [0x815f4d]
 7: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>,
ThreadPool::TPHandle&)+0x70c) [0x7b047c]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x34a) [0x60c74a]
 9: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>,
ThreadPool::TPHandle&)+0x628) [0x628808]
 10: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>,
std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG>
>::_void_process(void*, ThreadPool::TPHandle&)+0x9c) [0x66ea8c]
 11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0xa60416]
 12: (ThreadPool::WorkThread::entry()+0x10) [0xa62430]
 13: (()+0x8217) [0x7f8dae984217]
 14: (clone()+0x6d) [0x7f8dad129f8d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.

For the 2 previous crashes, I launched btrfs scrubs and it couldn't find
any problem. Could someone help diagnose what is going on? Is it a known
bug?

Best regards,

Lionel
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com