Re: PG stuck peering after host reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Since we need this pool to work again, we decided to take the data loss and try to move on.

So far, no luck. We tried a force create but, as expected, with a PG that is not peering this did absolutely nothing.
We also tried rm-past-intervals and remove from ceph-objectstore-tool and manually deleting the data directories in the disks. The PG remains down+remapped with two OSDs failing to join the acting set. These have been restarted multiple times to no avail.

# ceph pg map 1.323
osdmap e23122 pg 1.323 (1.323) -> up [595,1391,240,127,937,362,267,320,986,634,716] acting [595,1391,240,127,937,362,267,320,986,2147483647,2147483647]

We have also seen some very odd behaviour. 
# ceph pg map 1.323
osdmap e22909 pg 1.323 (1.323) -> up [595,1391,240,127,937,362,267,320,986,634,716] acting [595,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647,2147483647]

Straight after a restart of all OSDs in the PG and after everything else has settled down. From that state restarting 595 results in:

# ceph pg map 1.323
osdmap e22921 pg 1.323 (1.323) -> up [595,1391,240,127,937,362,267,320,986,634,716] acting [2147483647,1391,240,127,937,362,267,320,986,634,716]

Restarting 595 doesn't change this. Another restart of all OSDs in the PG results in the state seen above with the last two replaced by ITEM_NONE.

Another strange thing is that on osd.7 (the one originally at rank 8 that was restarted and caused this problem) the objectstore tool fails to remove the PG and crashes out:

# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op remove --pgid 1.323s8
 marking collection for removal
setting '_remove' omap key
finish_remove_pgs 1.323s8_head removing 1.323s8
 *** Caught signal (Aborted) **
 in thread 7fa713782700 thread_name:tp_fstore_op
 ceph version 11.2.0 (f223e27eeb35991352ebc1f67423d4ebc252adb7)
 1: (()+0x97463a) [0x7fa71c47563a]
 2: (()+0xf370) [0x7fa71935a370]
 3: (snappy::RawUncompress(snappy::Source*, char*)+0x374) [0x7fa71abd0cd4]
 4: (snappy::RawUncompress(char const*, unsigned long, char*)+0x3d) [0x7fa71abd0e2d]
 5: (leveldb::ReadBlock(leveldb::RandomAccessFile*, leveldb::ReadOptions const&, leveldb::BlockHandle const&, leveldb::BlockContents*)+0x35e) [0x7fa71b08007e]
 6: (leveldb::Table::BlockReader(void*, leveldb::ReadOptions const&, leveldb::Slice const&)+0x276) [0x7fa71b081196]
 7: (()+0x3c820) [0x7fa71b083820]
 8: (()+0x3c9cd) [0x7fa71b0839cd]
 9: (()+0x3ca3e) [0x7fa71b083a3e]
 10: (()+0x39c75) [0x7fa71b080c75]
 11: (()+0x21e20) [0x7fa71b068e20]
 12: (()+0x223c5) [0x7fa71b0693c5]
 13: (LevelDBStore::LevelDBWholeSpaceIteratorImpl::seek_to_first(std::string const&)+0x3d) [0x7fa71c3ecb1d]
 14: (LevelDBStore::LevelDBTransactionImpl::rmkeys_by_prefix(std::string const&)+0x138) [0x7fa71c3ec028]
 15: (DBObjectMap::clear_header(std::shared_ptr<DBObjectMap::_Header>, std::shared_ptr<KeyValueDB::TransactionImpl>)+0x1d0) [0x7fa71c400a40]
 16: (DBObjectMap::_clear(std::shared_ptr<DBObjectMap::_Header>, std::shared_ptr<KeyValueDB::TransactionImpl>)+0xa1) [0x7fa71c401171]
 17: (DBObjectMap::clear(ghobject_t const&, SequencerPosition const*)+0x1ff) [0x7fa71c4075bf]
 18: (FileStore::lfn_unlink(coll_t const&, ghobject_t const&, SequencerPosition const&, bool)+0x241) [0x7fa71c2c0d41]
 19: (FileStore::_remove(coll_t const&, ghobject_t const&, SequencerPosition const&)+0x8e) [0x7fa71c2c171e]
 20: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHandle*)+0x433e) [0x7fa71c2d8c6e]
 21: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, unsigned long, ThreadPool::TPHandle*)+0x3b) [0x7fa71c2db75b]
 22: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x2cd) [0x7fa71c2dba5d]
 23: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7fa71c63e189]
 24: (ThreadPool::WorkThread::entry()+0x10) [0x7fa71c63f160]
 25: (()+0x7dc5) [0x7fa719352dc5]
 26: (clone()+0x6d) [0x7fa71843e73d]
Aborted

At this point all we want to achieve is for the PG to peer again (and soon) without us having to delete the pool.

Any help would be appreciated...
________________________________________
From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of george.vasilakakos@xxxxxxxxxx [george.vasilakakos@xxxxxxxxxx]
Sent: 22 February 2017 14:35
To: wido@xxxxxxxx; ceph-users@xxxxxxxxxxxxxx
Subject: Re:  PG stuck peering after host reboot

So what I see there is this for osd.307:

    "empty": 1,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 0,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}

last_epoch_started is 0 and empty is 1. The other OSDs are reporting last_epoch_started 16806 and empty 0.

I noticed that too and was wondering why it never completed recovery and joined

> If you stop osd.307 and maybe mark it as out, does that help?

No, I see the same thing I saw when I took 595 out:

[root@ceph-mon1 ~]# ceph pg map 1.323
osdmap e22392 pg 1.323 (1.323) -> up [985,1391,240,127,937,362,267,320,7,634,716] acting [2147483647,1391,240,127,937,362,267,320,7,634,716]

Another OSD get chosen as the primary but never becomes acting on its own.

Another 11 PGs are reporting being undersized and having ITEM_NONE in their acting sets as well.

> ________________________________________
> From: Wido den Hollander [wido@xxxxxxxx]
> Sent: 22 February 2017 12:18
> To: Vasilakakos, George (STFC,RAL,SC); ceph-users@xxxxxxxxxxxxxx
> Subject: RE:  PG stuck peering after host reboot
>
> > Op 21 februari 2017 om 15:35 schreef george.vasilakakos@xxxxxxxxxx:
> >
> >
> > I have noticed something odd with the ceph-objectstore-tool command:
> >
> > It always reports PG X not found even on healthly OSDs/PGs. The 'list' op works on both and unhealthy PGs.
> >
>
> Are you sure you are supplying the correct PG ID?
>
> I just tested with (Jewel 10.2.5):
>
> $ ceph pg ls-by-osd 5
> $ systemctl stop ceph-osd@5
> $ ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-5 --op info --pgid 10.d0
> $ systemctl start ceph-osd@5
>
> Can you double-check that?
>
> It's weird that the PG can't be found on those OSDs by the tool.
>
> Wido
>
>
> > ________________________________________
> > From: ceph-users [ceph-users-bounces@xxxxxxxxxxxxxx] on behalf of george.vasilakakos@xxxxxxxxxx [george.vasilakakos@xxxxxxxxxx]
> > Sent: 21 February 2017 10:17
> > To: wido@xxxxxxxx; ceph-users@xxxxxxxxxxxxxx; bhubbard@xxxxxxxxxx
> > Subject: Re:  PG stuck peering after host reboot
> >
> > > Can you for the sake of redundancy post your sequence of commands you executed and their output?
> >
> > [root@ceph-sn852 ~]# systemctl stop ceph-osd@307
> > [root@ceph-sn852 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-307 --op info --pgid 1.323
> > PG '1.323' not found
> > [root@ceph-sn852 ~]# systemctl start ceph-osd@307
> >
> > I did the same thing for 307 (new up but not acting primary) and all the OSDs in the original set (including 595). The output was the exact same. I don't have the whole session log handy from all those sessions but here's a sample from one that's easy to pick out:
> >
> > [root@ceph-sn832 ~]# systemctl stop ceph-osd@7
> > [root@ceph-sn832 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-7 --op info --pgid 1.323
> > PG '1.323' not found
> > [root@ceph-sn832 ~]# systemctl start ceph-osd@7
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/
> > 0.18_head/      11.1c8s5_TEMP/  13.3b_head/     1.74s1_TEMP/    2.256s6_head/   2.c3s10_TEMP/   3.b9s4_head/
> > 0.18_TEMP/      1.16s1_head/    13.3b_TEMP/     1.8bs9_head/    2.256s6_TEMP/   2.c4s3_head/    3.b9s4_TEMP/
> > 1.106s10_head/  1.16s1_TEMP/    1.3a6s0_head/   1.8bs9_TEMP/    2.2d5s2_head/   2.c4s3_TEMP/    4.34s10_head/
> > 1.106s10_TEMP/  1.274s5_head/   1.3a6s0_TEMP/   2.174s10_head/  2.2d5s2_TEMP/   2.dbs7_head/    4.34s10_TEMP/
> > 11.12as10_head/ 1.274s5_TEMP/   1.3e4s9_head/   2.174s10_TEMP/  2.340s8_head/   2.dbs7_TEMP/    commit_op_seq
> > 11.12as10_TEMP/ 1.2ds8_head/    1.3e4s9_TEMP/   2.1c1s10_head/  2.340s8_TEMP/   3.159s3_head/   meta/
> > 11.148s2_head/  1.2ds8_TEMP/    14.1a_head/     2.1c1s10_TEMP/  2.36es10_head/  3.159s3_TEMP/   nosnap
> > 11.148s2_TEMP/  1.323s8_head/   14.1a_TEMP/     2.1d0s6_head/   2.36es10_TEMP/  3.170s1_head/   omap/
> > 11.165s6_head/  1.323s8_TEMP/   1.6fs9_head/    2.1d0s6_TEMP/   2.3d3s10_head/  3.170s1_TEMP/
> > 11.165s6_TEMP/  13.32_head/     1.6fs9_TEMP/    2.1efs2_head/   2.3d3s10_TEMP/  3.1aas5_head/
> > 11.1c8s5_head/  13.32_TEMP/     1.74s1_head/    2.1efs2_TEMP/   2.c3s10_head/   3.1aas5_TEMP/
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_
> > 1.323s8_head/ 1.323s8_TEMP/
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_
> > DIR_3/ DIR_7/ DIR_B/ DIR_F/
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_
> > DIR_0/ DIR_1/ DIR_2/ DIR_3/ DIR_4/ DIR_5/ DIR_6/ DIR_7/ DIR_8/ DIR_9/ DIR_A/ DIR_B/ DIR_C/ DIR_D/ DIR_E/ DIR_F/
> > [root@ceph-sn832 ~]# ll /var/lib/ceph/osd/ceph-7/current/1.323s8_head/DIR_3/DIR_2/DIR_3/DIR_1/
> > total 271276
> > -rw-r--r--. 1 ceph ceph 8388608 Feb  3 22:07 datadisk\srucio\sdata16\u13TeV\s11\sad\sDAOD\uTOPQ4.09383728.\u000436.pool.root.1.0000000000000001__head_2BA91323__1_ffffffffffffffff_8
> >
> > > If you run a find in the data directory of the OSD, does that PG show up?
> >
> > OSDs 595 (used to be 0), 1391(1), 240(2), 7(7, the one that started this) have a 1.323_headsX directory. OSD 307 does not.
> > I have not checked the other OSDs in the PG yet.
> >
> > Wido
> >
> > >
> > > Best regards,
> > >
> > > George
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux