Bluestore OSD died - error (39) Directory not empty not handled on operation 21

Stephen Lord <Steve.Lord@xxxxxxxxxxx> · Tue, 5 Apr 2016 17:28:11 +0000

I was experimenting with using bluestore OSDs and appear to have found a fairly consistent way to crash them…

Changing the number of copies in a pool down from 3 to 1 has now twice caused the mass panic of a whole pool of OSDs. In one case it was a cache tier, in another case it was just a pool hosting rbd images. 

>From the log file of one of the OSDs:

2016-04-05 12:09:54.272475 7f5a58027700  0 bluestore(/var/lib/ceph/osd/ceph-43)  error (39) Directory not empty not handled on operation 21 (op 1, counting from 0)
2016-04-05 12:09:54.272489 7f5a58027700  0 bluestore(/var/lib/ceph/osd/ceph-43)  transaction dump:
{
    "ops": [
        {
            "op_num": 0,
            "op_name": "remove",
            "collection": "2.354_head",
            "oid": "#2:2ac00000::::head#"
        },
        {
            "op_num": 1,
            "op_name": "rmcoll",
            "collection": "2.354_head"
        }
    ]
}

2016-04-05 12:09:54.275114 7f5a58027700 -1 os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f5a58027700 time 2016-04-05 12:09:54.272532
os/bluestore/BlueStore.cc: 4357: FAILED assert(0 == "unexpected error")

 ceph version 10.1.0 (96ae8bd25f31862dbd5302f304ebf8bf1166aba6)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x85) [0x7f5a82e74a55]
 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x77a) [0x7f5a82b02eba]
 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, std::shared_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x3a5) [0x7f5a82b056e5]
 4: (ObjectStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, Context*, Context*, Context*, Context*, std::shared_ptr<TrackedOp>)+0x2a6) [0x7f5a82aad0b6]
 5: (OSD::RemoveWQ::_process(std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, ThreadPool::TPHandle&)+0x6e4) [0x7f5a827debb4]
 6: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> >, std::pair<boost::intrusive_ptr<PG>, std::shared_ptr<DeletingState> > >::_void_process(void*, ThreadPool::TPHandle&)+0x11a) [0x7f5a8283a15a]
 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa7e) [0x7f5a82e65a9e]
 8: (ThreadPool::WorkThread::entry()+0x10) [0x7f5a82e66980]
 9: (()+0x7dc5) [0x7f5a80dbedc5]
 10: (clone()+0x6d) [0x7f5a7f44a28d]

In both cases a replicated pool with 3 copies was created, some content added and then the number of copies set down to 1. Not a common thing to do I know, but this works on FileStore OSDs.

This is a cluster deployed using redhat 7 Jewel (10.1) RPMs from download.ceph.com

Steve

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any disclosure, copying, or further distribution of confidential information is not permitted unless such privilege is explicitly granted in writing by Quantum. Quantum reserves the right to have electronic communications, including email and attachments, sent across its networks filtered through anti virus and spam software programs and retain such messages in order to comply with applicable data security and retention requirements. Quantum is not responsible for the proper and complete transmission of the substance of this communication or for any delay in its receipt.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com