Problem with OSD down and problematic rbd object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

yesterday I got OSD down with error

2018-01-04 06:47:25.304513 7fe6eda51700 -1 log_channel(cluster) log [ERR] : 6.20 repair 1 missing, 0 inconsistent objects 2018-01-04 06:47:25.312861 7fe6eda51700 -1 log_channel(cluster) log [ERR] : 6.20 repair 3 errors, 2 fixed 2018-01-04 06:47:26.796659 7fe6eda51700 -1 /build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7fe6eda51700 time 2018-
01-04 06:47:26.649174
/build/ceph-12.2.1/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())

 ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x562994121de2]  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Transaction*)+0x11f0) [0x562993ccec10]  3: (ReplicatedBackend::handle_pull_response(pg_shard_t, PushOp const&, PullOp*, std::list<ReplicatedBackend::pull_complete_info, std::allocator<ReplicatedBackend::pull_complete_info> >*, ObjectStore::Transaction*)+0x788) [0x562993e4bb98]  4: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x2a6) [0x562993e4db36]  5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x214) [0x562993e50c04]  6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x562993d75ec0]  7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x77b) [0x562993ce265b]  8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3f7) [0x562993b749e7]  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x562993de6ad7]  10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x108c) [0x562993ba121c]  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) [0x562994127a6d]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x562994129a30]
 13: (()+0x7494) [0x7fe706e36494]
 14: (clone()+0x3f) [0x7fe705ebdaff]


I was unable to start that OSD, but I succeeded after few hours and got OSD up (don't know why restart was not sufficient after OSD failure, but later was OK).

What does it mean? How to prevent that?

Deep scrub after that showed, that there is missing object on 2 OSDs. It seems to me, that this object is part of the deleted snapshot (from "snap": 37 see below). Maybe during snaptrim something wrong happened and object was not deleted from one OSD and now it looks inconsistent. How can I find, that I can safely delete that object? That it is not part of any RBD snapshot?

Below pasting  rados list-inconsistent-obj --format=json-pretty 6.20

{
    "epoch": 7240,
    "inconsistents": [
        {
            "object": {
                "name": "rbd_data.967992ae8944a.000000000006b41f",
                "nspace": "",
                "locator": "",
                "snap": 37,
                "version": 649618
            },
            "errors": [],
            "union_shard_errors": [
                "missing"
            ],
            "selected_object_info": "6:0663e376:::rbd_data.967992ae8944a.000000000006b41f:25(6240'452090 osd.1.0:251266 dirty|data_digest|omap_digest s 4194304 uv 649618 dd e0468a41 od ffffffff alloc_hint [0 0 0])",
            "shards": [
                {
                    "osd": 1,
                    "primary": true,
                    "errors": [
                        "missing"
                    ]
                },
                {
                    "osd": 10,
                    "primary": false,
                    "errors": [
                        "missing"
                    ]
                },
                {
                    "osd": 14,
                    "primary": false,
                    "errors": [],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0xe0468a41"
                }
            ]
        }
    ]
}

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux