Re: force_create_pg stuck on creating

flisky <yinjifeng@xxxxxxxxxxx> · Fri, 15 May 2015 23:54:45 +0800

All OSDs are up and in, and crushmap should be okay.

ceph -s:

     health HEALTH_WARN
            9 pgs stuck inactive
            9 pgs stuck unclean
            149 requests are blocked > 32 sec
            too many PGs per OSD (4393 > max 300)
            pool .rgw.buckets has too few pgs
            noout flag(s) set
     osdmap e17160: 24 osds: 24 up, 24 in
            flags noout
      pgmap ...
               46068 active+clean
                   9 creating

The pgs belong to the pool '.rgw.buckets'. It stuck on creating for hours.

Also, RGWs are stuck. It only connects to 4 OSD ports and exits with 
'initialization timeout' after a restart. However telnetting other OSD 
port is okay.

Please help.

Thanks!

On 2015年05月15日 18:17, flisky wrote:
Hi list,

I reformatted some OSDs to increase the journal_size, and just did it in
the hurry, some pgs have lost data and in the incomplete status.

The cluster is stuck in 'creating' status after **ceph osd lost xx** and
**force_create_pg**. I find the dir 'osd-xx/current/xx.xxx_head' only
contains an empty file under the last acting list. and it never create
again after move the dir out.

It caused slow requests made our online rgw service unstable.

Could anyone give me a hint to continue debugging this?

BTW, ceph's version -- 0.94.1

Thanks，

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com