> Op 23 februari 2017 om 19:09 schreef george.vasilakakos@xxxxxxxxxx: > > > Since we need this pool to work again, we decided to take the data loss and try to move on. > > So far, no luck. We tried a force create but, as expected, with a PG that is not peering this did absolutely nothing. True, only works for a stale PG. > We also tried rm-past-intervals and remove from ceph-objectstore-tool and manually deleting the data directories in the disks. The PG remains down+remapped with two OSDs failing to join the acting set. These have been restarted multiple times to no avail. So you removed the PG from all the OSDs? 595,1391,240,127,937,362,267,320,986,634,716? > > # ceph pg map 1.323 > osdmap e23122 pg 1.323 (1.323) -> up [595,1391,240,127,937,362,267,320,986,634,716] acting [595,1391,240,127,937,362,267,320,986,2147483647,2147483647] > > We have also seen some very odd behaviour. > # ceph pg map 1.323 > osdmap e22909 pg 1.323 (1.323) -> up [595,1391,240,127,937,362,267,320,986,634,716] acting [595,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647] > > Straight after a restart of all OSDs in the PG and after everything else has settled down. From that state restarting 595 results in: > > # ceph pg map 1.323 > osdmap e22921 pg 1.323 (1.323) -> up [595,1391,240,127,937,362,267,320,986,634,716] acting [2147483647,1391,240,127,937,362,267,320,986,634,716] > > Restarting 595 doesn't change this. Another restart of all OSDs in the PG results in the state seen above with the last two replaced by ITEM_NONE. > > Another strange thing is that on osd.7 (the one originally at rank 8 that was restarted and caused this problem) the objectstore tool fails to remove the PG and crashes out: > > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op remove --pgid 1.323s8 > marking collection for removal > setting '_remove' omap key > finish_remove_pgs 1.323s8_head removing 1.323s8 > *** Caught signal (Aborted) ** > in thread 7fa713782700 thread_name:tp_fstore_op > ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7) > 1: (()+0x97463a) [0x7fa71c47563a] > 2: (()+0xf370) [0x7fa71935a370] > 3: (snappy::RawUncompress(snappy::Source*, char*)+0x374) [0x7fa71abd0cd4] > 4: (snappy::RawUncompress(char const*, unsigned long, char*)+0x3d) [0x7fa71abd0e2d] > 5: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x35e) [0x7fa71b08007e] > 6: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7fa71b081196] > 7: (()+0x3c820) [0x7fa71b083820] > 8: (()+0x3c9cd) [0x7fa71b0839cd] > 9: (()+0x3ca3e) [0x7fa71b083a3e] > 10: (()+0x39c75) [0x7fa71b080c75] > 11: (()+0x21e20) [0x7fa71b068e20] > 12: (()+0x223c5) [0x7fa71b0693c5] > 13: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::seek_to_first(std::string const&)+0x3d) [0x7fa71c3ecb1d] > 14: (LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string const&)+0x138) [0x7fa71c3ec028] > 15: (DBObjectMap::clear_header(std::shared_ptr<DBObjectMap::_Header>, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x1d0) [0x7fa71c400a40] > 16: (DBObjectMap::_clear(std::shared_ptr<DBObjectMap::_Header>, std::shared_ptr<KeyValueDB::TransactionImpl>)+0xa1) [0x7fa71c401171] > 17: (DBObjectMap::clear(ghobject_t const&, SequencerPosition const*)+0x1ff) [0x7fa71c4075bf] > 18: (FileStore::lfn_unlink(coll_t const&, ghobject_t const&, SequencerPosition const&, bool)+0x241) [0x7fa71c2c0d41] > 19: (FileStore::_remove(coll_t const&, ghobject_t const&, SequencerPosition const&)+0x8e) [0x7fa71c2c171e] > 20: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x433e) [0x7fa71c2d8c6e] > 21: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7fa71c2db75b] > 22: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x2cd) [0x7fa71c2dba5d] > 23: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7fa71c63e189] > 24: (ThreadPool::WorkThread::entry()+0x10) [0x7fa71c63f160] > 25: (()+0x7dc5) [0x7fa719352dc5] > 26: (clone()+0x6d) [0x7fa71843e73d] > Aborted > > At this point all we want to achieve is for the PG to peer again (and soon) without us having to delete the pool. > > Any help would be appreciated... At first, my EC experience here is too small to exactly tell you what is happening. What you could do: - Remove these OSDs from CRUSH - Wait for recovery to complete - Stop the OSDS - Remove their cephx key - Mark them as lost At this point PG 1.323 should go into a incomplete or stale state. You should then be able to force re-create it. This worked for me with a replicated pool, never tried this with EC. Afterwards you can re-create these OSDs again. Wido > ________________________________________ > From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of george.vasilakakos@xxxxxxxxxx [george.vasilakakos@xxxxxxxxxx] > Sent: 22 February 2017 14:35 > To: wido@xxxxxxxx; ceph-users@xxxxxxxxxxxxxx > Subject: Re: PG stuck peering after host reboot > > So what I see there is this for osd.307: > > "empty": 1, > "dne": 0, > "incomplete": 0, > "last_epoch_started": 0, > "hit_set_history": { > "current_last_update": "0'0", > "history": [] > } > } > > last_epoch_started is 0 and empty is 1. The other OSDs are reporting last_epoch_started 16806 and empty 0. > > I noticed that too and was wondering why it never completed recovery and joined > > > If you stop osd.307 and maybe mark it as out, does that help? > > No, I see the same thing I saw when I took 595 out: > > [root@ceph-mon1 ~]# ceph pg map 1.323 > osdmap e22392 pg 1.323 (1.323) -> up [985,1391,240,127,937,362,267,320,7,634,716] acting [2147483647,1391,240,127,937,362,267,320,7,634,716] > > Another OSD get chosen as the primary but never becomes acting on its own. > > Another 11 PGs are reporting being undersized and having ITEM_NONE in their acting sets as well. > > > ________________________________________ > > From: Wido den Hollander [wido@xxxxxxxx] > > Sent: 22 February 2017 12:18 > > To: Vasilakakos, George (STFC,RAL,SC); ceph-users@xxxxxxxxxxxxxx > > Subject: RE: PG stuck peering after host reboot > > > > > Op 21 februari 2017 om 15:35 schreef george.vasilakakos@xxxxxxxxxx: > > > > > > > > > I have noticed something odd with the ceph-objectstore-tool command: > > > > > > It always reports PG X not found even on healthly OSDs/PGs. The 'list' op works on both and unhealthy PGs. > > > > > > > Are you sure you are supplying the correct PG ID? > > > > I just tested with (Jewel 10.2.5): > > > > $ ceph pg ls-by-osd 5 > > $ systemctl stop ceph-osd@5 > > $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --op info --pgid 10.d0 > > $ systemctl start ceph-osd@5 > > > > Can you double-check that? > > > > It's weird that the PG can't be found on those OSDs by the tool. > > > > Wido > > > > > > > ________________________________________ > > > From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of george.vasilakakos@xxxxxxxxxx [george.vasilakakos@xxxxxxxxxx] > > > Sent: 21 February 2017 10:17 > > > To: wido@xxxxxxxx; ceph-users@xxxxxxxxxxxxxx; bhubbard@xxxxxxxxxx > > > Subject: Re: PG stuck peering after host reboot > > > > > > > Can you for the sake of redundancy post your sequence of commands you executed and their output? > > > > > > [root@ceph-sn852 ~]# systemctl stop ceph-osd@307 > > > [root@ceph-sn852 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-307 --op info --pgid 1.323 > > > PG '1.323' not found > > > [root@ceph-sn852 ~]# systemctl start ceph-osd@307 > > > > > > I did the same thing for 307 (new up but not acting primary) and all the OSDs in the original set (including 595). The output was the exact same. I don't have the whole session log handy from all those sessions but here's a sample from one that's easy to pick out: > > > > > > [root@ceph-sn832 ~]# systemctl stop ceph-osd@7 > > > [root@ceph-sn832 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op info --pgid 1.323 > > > PG '1.323' not found > > > [root@ceph-sn832 ~]# systemctl start ceph-osd@7 > > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/ > > > 0.18_head/ 11.1c8s5_TEMP/ 13.3b_head/ 1.74s1_TEMP/ 2.256s6_head/ 2.c3s10_TEMP/ 3.b9s4_head/ > > > 0.18_TEMP/ 1.16s1_head/ 13.3b_TEMP/ 1.8bs9_head/ 2.256s6_TEMP/ 2.c4s3_head/ 3.b9s4_TEMP/ > > > 1.106s10_head/ 1.16s1_TEMP/ 1.3a6s0_head/ 1.8bs9_TEMP/ 2.2d5s2_head/ 2.c4s3_TEMP/ 4.34s10_head/ > > > 1.106s10_TEMP/ 1.274s5_head/ 1.3a6s0_TEMP/ 2.174s10_head/ 2.2d5s2_TEMP/ 2.dbs7_head/ 4.34s10_TEMP/ > > > 11.12as10_head/ 1.274s5_TEMP/ 1.3e4s9_head/ 2.174s10_TEMP/ 2.340s8_head/ 2.dbs7_TEMP/ commit_op_seq > > > 11.12as10_TEMP/ 1.2ds8_head/ 1.3e4s9_TEMP/ 2.1c1s10_head/ 2.340s8_TEMP/ 3.159s3_head/ meta/ > > > 11.148s2_head/ 1.2ds8_TEMP/ 14.1a_head/ 2.1c1s10_TEMP/ 2.36es10_head/ 3.159s3_TEMP/ nosnap > > > 11.148s2_TEMP/ 1.323s8_head/ 14.1a_TEMP/ 2.1d0s6_head/ 2.36es10_TEMP/ 3.170s1_head/ omap/ > > > 11.165s6_head/ 1.323s8_TEMP/ 1.6fs9_head/ 2.1d0s6_TEMP/ 2.3d3s10_head/ 3.170s1_TEMP/ > > > 11.165s6_TEMP/ 13.32_head/ 1.6fs9_TEMP/ 2.1efs2_head/ 2.3d3s10_TEMP/ 3.1aas5_head/ > > > 11.1c8s5_head/ 13.32_TEMP/ 1.74s1_head/ 2.1efs2_TEMP/ 2.c3s10_head/ 3.1aas5_TEMP/ > > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_ > > > 1.323s8_head/ 1.323s8_TEMP/ > > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_ > > > DIR_3/ DIR_7/ DIR_B/ DIR_F/ > > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_ > > > DIR_0/ DIR_1/ DIR_2/ DIR_3/ DIR_4/ DIR_5/ DIR_6/ DIR_7/ DIR_8/ DIR_9/ DIR_A/ DIR_B/ DIR_C/ DIR_D/ DIR_E/ DIR_F/ > > > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_1/ > > > total 271276 > > > -rw-r--r--. 1 ceph ceph 8388608 Feb 3 22:07 datadisk\srucio\sdata16\u13TeV\s11\sad\sDAOD\uTOPQ4.09383728.\u000436.pool.root.1.0000000000000001__head_2BA91323__1_ffffffffffffffff_8 > > > > > > > If you run a find in the data directory of the OSD, does that PG show up? > > > > > > OSDs 595 (used to be 0), 1391(1), 240(2), 7(7, the one that started this) have a 1.323_headsX directory. OSD 307 does not. > > > I have not checked the other OSDs in the PG yet. > > > > > > Wido > > > > > > > > > > > Best regards, > > > > > > > > George > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com