Re: Ceph Luminous - OSD constantly crashing caused by corrupted placement group

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 17.05.2018 um 00:12 schrieb Gregory Farnum:


I'm a bit confused. Are you saying that
1) the ceph-objectstore-tool you pasted there successfully removed pg 5.9b from osd.130 (as it appears), AND
Yes. The process ceph-osd for osd.130 was not runnin in that phase.
2) pg 5.9b was active with one of the other OSDs as primary, so all data remained available, AND
Yes. pg 5.9b is active all of the time (on two other OSDs). I think OSD.19 is the primary for that pg.
"ceph pg 5.9b query" thells me :
.....
    "up": [
        19,
        166
    ],
    "acting": [
        19,
        166
    ],
    "actingbackfill": [
        "19",
        "166"
    ],
....

3) when pg 5.9b got backfilled into osd.130, osd.130 crashed again? (But the other OSDs kept the PG fully available, without crashing?)
Yes.

It crashes again with the following lines in the osd log :
    -2> 2018-05-16 11:11:59.639980 7fe812ffd700  5 -- 10.7.2.141:6800/173031 >> 10.7.2.49:6836/3920 conn(0x5619ed76c000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=24047 cs=1 l=0). rx osd.19 seq 24 0x5619eebd6d00 pg_backfill(progress 5.9b e 505567/505567 lb 5:d97d84eb:::rbd_data.112913b238e1f29.0000000000000ba3:56c06) v3
    -1> 2018-05-16 11:11:59.639995 7fe812ffd700  1 -- 10.7.2.141:6800/173031 <== osd.19 10.7.2.49:6836/3920 24 ==== pg_backfill(progress 5.9b e 505567/505567 lb 5:d97d84eb:::rbd_data.112913b238e1f29.0000000000000ba3:56c06) v3 ==== 955+0+0 (3741758263 0 0) 0x5619eebd6d00 con 0x5619ed76c000
     0> 2018-05-16 11:11:59.645952 7fe7fe7eb700 -1 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: In function 'virtual void PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7fe7fe7eb700 time 2018-05-16 11:11:59.640238
/build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x5619c11b1a02]
 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool, ObjectStore::Transaction*)+0xd63) [0x5619c0d1f873]
 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2da) [0x5619c0eb15ca]
 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x12e) [0x5619c0eb17fe]
 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1) [0x5619c0ec0d71]
 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x5619c0dcc440]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x543) [0x5619c0d30853]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x5619c0ba7539]
 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x5619c0e50f37]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x5619c0bd5847]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5619c11b67f4]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5619c11b9830]
 13: (()+0x76ba) [0x7fe8173746ba]
 14: (clone()+0x6d) [0x7fe8163eb41d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


That sequence of events is *deeply* confusing and I really don't understand how it might happen.

Sadly I don't think you can grab a PG for export without stopping the OSD in question.
 

When we query the pg, we can see a lot of "snap_trimq".
Can this be cleaned somehow, even if the pg is undersized and degraded ?

I *think* the PG will keep trimming snapshots even if undersized+degraded (though I don't remember for sure), but snapshot trimming is often heavily throttled and I'm not aware of any way to specifically push one PG to the front. If you're interested in speeding snaptrimming up you can search the archives or check the docs for the appropriate config options.
-Greg

    Ok. I think we should try that next.

Thank you !




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux