Re: Stuck in stale state

Jan Pekař <jan.pekar@xxxxxxxxx> · Mon, 10 Nov 2014 22:58:57 +0100

Thank you, sorry for bothering, I was new to ceph-users list and I 
couldn't cancel my message. I found out what happened few hours later.

Main problem was, that I moved one OSD from "host hostname {}" crush map 
entry (I wanted to do so). Everything was OK, but restart of OSD caused 
automatic osd placement back into "host hostname {}" crush map section 
again.
I solved it with
 osd crush update on start = false
see ceph-crush-location hook
http://ceph.com/docs/master/rados/operations/crush-map/

You can consider solved, no problem with CEPH, only my poor knowledge 
caused that.

JP

On 2014-11-10 20:53, Craig Lewis wrote:
"nothing to send, going to standby" isn't necessarily bad, I see it from
time to time.  It shouldn't stay like that for long though.  If it's
been 5 minutes, and the cluster still isn't doing anything, I'd restart
that osd.

On Fri, Nov 7, 2014 at 1:55 PM, Jan Pekař <jan.pekar@xxxxxxxxx
<mailto:jan.pekar@xxxxxxxxx>> wrote:

    Hi,

    I was testing ceph cluster map changes and I got to stuck state
    which seems to be indefinite.
    First my description what I have done.

    I'm testing special case with only one copy of pg's (pool size = 1).

    All pg's was on one osd.0. I created second osd.1 and modified
    cluster map to transfer one pool (metadata) to the newly created osd.1
    PG's started to remap and "objects degraded number" was dropping -
    so everything looked normal.

    During that recovery process I restarted both osd daemons.
    After that I noticed, that pg's, that should be remapped had "stale"
    state - stale+active+remapped+__backfilling and other object with
    stale state .
    I tried to run ceph pg force_create_pg on one pg, that should be
    remapped, but nothing changed (that is 1 stuck / creating PG below
    in ceph health)

    Command rados -p metadata ls hangs so data are unavailable, but it
    should be there.

    What should I do in this state to get it working?

    ceph -s below:

         cluster 93418692-8e2e-4689-a237-__ed5b47f39f72
          health HEALTH_WARN 52 pgs backfill; 1 pgs backfilling; 63 pgs
    stale; 1 pgs stuck inactive; 63 pgs stuck stale; 54 pgs stuck
    unclean; recovery 107232/1881806 objects degraded (5.698%);
    mon.imatic-mce low disk space
          monmap e1: 1 mons at {imatic-mce=192.168.11.165:__6789/0
    <http://192.168.11.165:6789/0>}, election epoch 1, quorum 0 imatic-mce
          mdsmap e450: 1/1/1 up {0=imatic-mce=up:active}
          osdmap e275: 2 osds: 2 up, 2 in
           pgmap v51624: 448 pgs, 4 pools, 790 GB data, 1732 kobjects
                 804 GB used, 2915 GB / 3720 GB avail
                 107232/1881806 objects degraded (5.698%)
                       52 stale+active+remapped+wait___backfill
                        1 creating
                        1 stale+active+remapped+__backfilling
                       10 stale+active+clean
                      384 active+clean

    Last message in OSD log's:

    2014-11-07 22:17:45.402791 deb4db70  0 -- 192.168.11.165:6804/29564
    <http://192.168.11.165:6804/29564> >> 192.168.11.165:6807/29939
    <http://192.168.11.165:6807/29939> pipe(0x9d52f00 sd=213 :53216 s=2
    pgs=1 cs=1 l=0 c=0x2c7f58c0).fault with nothing to send, going to
    standby

    Thank you for help
    With regards
    Jan Pekar, ceph fan
    _________________________________________________
    ceph-users mailing list
    ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
    http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com
    <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

--
============
Ing. Jan Pekař
jan.pekar@xxxxxxxxx | +420603811737
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
http://www.imatic.cz
============
--
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com