FAILED assert(p != recovery_info.ss.clone_snaps.end())

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

After a RBD snapshot was removed, I seem to be having OSD's assert when they try and recover pg 1.2ca. The issue seems to follow the
PG around as OSD's fail. I've seen this bug tracker and associated mailing list post, but would appreciate if anyone can give any
pointers. https://tracker.ceph.com/issues/23030,

Cluster is 12.2.5 with Filestore and 3x replica pools

I noticed after the snapshot was removed that there were 2 inconsistent PG's. I can a repair on both and one of them (1.2ca) seems
to have triggered this issue.


Snippet of log from two different OSD's. 1st one is from the original OSD holding the PG, 2nd is from where the OSD was marked out
and it was trying to be recovered to:


    -4> 2018-06-05 15:15:45.997225 7febce7a5700  1 -- [2a03:25e0:254:5::112]:6819/3544315 --> [2a03:25e0:253:5::14]:0/3307345 --
osd_ping(ping_reply e2196479 stamp 2018-06-05 15:15:45.994907) v4 -- 0x557d3d72f800 con 0
    -3> 2018-06-05 15:15:46.018088 7febb2954700  1 -- [2a03:25e0:254:5::112]:6817/3544315 --> [2a03:25e0:254:5::12]:6809/5784710 --
MOSDPGPull(1.2ca e2196479/2196477 cost 8389608) v3 -- 0x557d4d180b40 con 0
    -2> 2018-06-05 15:15:46.018412 7febce7a5700  5 -- [2a03:25e0:254:5::112]:6817/3544315 >> [2a03:25e0:254:5::12]:6809/5784710
conn(0x557d4b1a9000 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13470 cs=1 l=0).
rx osd.46 seq 13 0x557d4d180b40 MOSDPGPush(1.2ca 2196479/2196477 [PushOp(1:534b0c9f:::rbd_data.0c4c14238e1f29.00000000000bf479:1c,
version: 2195927'1249660, data_included: [], data_size: 0, omap_header_size: 0, omap_ent
ries_size: 0, attrset_size: 1, recovery_info:
ObjectRecoveryInfo(1:534b0c9f:::rbd_data.0c4c14238e1f29.00000000000bf479:1c@2195927'1249660, size: 0, copy_subset: [], clone_subset:
{}, snapset: 1c=[]:{}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:true, error:false),
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap
_recovered_to:, omap_complete:false, error:false))]) v3
    -1> 2018-06-05 15:15:46.018425 7febce7a5700  1 -- [2a03:25e0:254:5::112]:6817/3544315 <== osd.46
[2a03:25e0:254:5::12]:6809/5784710 13 ==== MOSDPGPush(1.2ca 2196479/2196477 [PushOp(1:534b0c9f:::rbd_data.0c4c14238e1f
29.00000000000bf479:1c, version: 2195927'1249660, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0,
attrset_size: 1, recovery_info: ObjectRecoveryInfo(1:534b0c9f:::rbd_data.0c4c14238e1f29.00000
000000bf479:1c@2195927'1249660, size: 0, copy_subset: [], clone_subset: {}, snapset: 1c=[]:{}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:, omap_complete:t
rue, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:,
omap_complete:false, error:false))]) v3 ==== 885+0+0 (695790480 0 0) 0x557d4d180b40 con 0x5
57d4b1a9000
     0> 2018-06-05 15:15:46.022099 7febb2954700 -1 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: In function 'virtual void
PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextRef, bool
, ObjectStore::Transaction*)' thread 7febb2954700 time 2018-06-05 15:15:46.019130
/build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())







    -4> 2018-06-05 16:28:59.560140 7fcd7b655700  5 -- [2a03:25e0:254:5::113]:6829/525383 >> [2a03:25e0:254:5::12]:6809/5784710
conn(0x557447510800 :6829 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13524 cs
=1 l=0). rx osd.46 seq 6 0x5574480d0d80 MOSDPGPush(1.2ca 2196813/2196812
[PushOp(1:534b0c9f:::rbd_data.0c4c14238e1f29.00000000000bf479:1c, version: 2195927'1249660, data_included: [], data_size: 0,
omap_header_s
ize: 0, omap_entries_size: 0, attrset_size: 1, recovery_info:
ObjectRecoveryInfo(1:534b0c9f:::rbd_data.0c4c14238e1f29.00000000000bf479:1c@2195927'1249660, size: 4194304, copy_subset: [],
clone_subset: {}, snapse
t: 1c=[]:{}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_recovered_to:,
omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_rec
overed_to:0, data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3
    -3> 2018-06-05 16:28:59.560155 7fcd7b655700  1 -- [2a03:25e0:254:5::113]:6829/525383 <== osd.46
[2a03:25e0:254:5::12]:6809/5784710 6 ==== MOSDPGPush(1.2ca 2196813/2196812 [PushOp(1:534b0c9f:::rbd_data.0c4c14
238e1f29.00000000000bf479:1c, version: 2195927'1249660, data_included: [], data_size: 0, omap_header_size: 0, omap_entries_size: 0,
attrset_size: 1, recovery_info: ObjectRecoveryInfo(1:534b0c9f:::rbd_data.0c4c14
238e1f29.00000000000bf479:1c@2195927'1249660, size: 4194304, copy_subset: [], clone_subset: {}, snapset: 1c=[]:{}), after_progress:
ObjectRecoveryProgress(!first, data_recovered_to:0, data_complete:true, omap_re
covered_to:, omap_complete:true, error:false), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3 ==== 909+0+0 (7
22394556 0 0) 0x5574480d0d80 con 0x557447510800
    -2> 2018-06-05 16:28:59.560183 7fcd7b655700  5 -- [2a03:25e0:254:5::113]:6829/525383 >> [2a03:25e0:254:5::12]:6809/5784710
conn(0x557447510800 :6829 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=13524 cs
=1 l=0). rx osd.46 seq 7 0x55744783f900 pg_backfill(progress 1.2ca e 2196813/2196813 lb
1:534b0b88:::rbd_data.f870ac238e1f29.00000000000ff145:head) v3
    -1> 2018-06-05 16:28:59.560189 7fcd7b655700  1 -- [2a03:25e0:254:5::113]:6829/525383 <== osd.46
[2a03:25e0:254:5::12]:6809/5784710 7 ==== pg_backfill(progress 1.2ca e 2196813/2196813 lb 1:534b0b88:::rbd_data
.f870ac238e1f29.00000000000ff145:head) v3 ==== 946+0+0 (3865576583 0 0) 0x55744783f900 con 0x557447510800
     0> 2018-06-05 16:28:59.564054 7fcd5f3ba700 -1 /build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: In function 'virtual void
PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, ObjectContextR
ef, bool, ObjectStore::Transaction*)' thread 7fcd5f3ba700 time 2018-06-05 16:28:59.561060
/build/ceph-12.2.5/src/osd/PrimaryLogPG.cc: 358: FAILED assert(p != recovery_info.ss.clone_snaps.end())

 ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x557424971a02]
 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo const&, std::shared_ptr<ObjectContext>, bool,
ObjectStore::Transaction*)+0xd63) [0x5574244df873]
 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, ObjectStore::Transaction*)+0x2da) [0x5574246715ca]
 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x12e) [0x5574246717fe]
 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x2c1) [0x557424680d71]
 6: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x50) [0x55742458c440]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x543) [0x5574244f0853]
 8: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3a9) [0x557424367539]
 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x57) [0x557424610f37]
 10: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1047) [0x557424395847]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x884) [0x5574249767f4]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x557424979830]
 13: (()+0x76ba) [0x7fcd7f1cb6ba]
 14: (clone()+0x6d) [0x7fcd7e24241d]

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux