Re: force_create_pg stuck on creating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After I restart all OSD daemons one batch by one batch, the number requests are blocked goes down. and RGWs would come back online.
The 'netstat -anp|grep radosgw|grep ESTABLISHED|wc -l'
eventually goes the 25 (24 OSDs + 1 MON)

About the slow request, the log shows -
'slow request 15360.083412 seconds old, received at 2015-05-15 19:45:41.529106: osd_op(client.100343.0:2714 default.5606.3_xxxxx [getxattrs,stat,read 0~524288] 12.24e29d3f ack+read+known_if_redirected e17160) currently reached_pg'
But the pg 12.24e is active/clean, so why would request be slow/blocked?

I think the RGW would eventually be NA if more slow requests. Is that true?

Thanks!

On 2015年05月15日 23:54, flisky wrote:
All OSDs are up and in, and crushmap should be okay.

ceph -s:

      health HEALTH_WARN
             9 pgs stuck inactive
             9 pgs stuck unclean
             149 requests are blocked > 32 sec
             too many PGs per OSD (4393 > max 300)
             pool .rgw.buckets has too few pgs
             noout flag(s) set
      osdmap e17160: 24 osds: 24 up, 24 in
             flags noout
       pgmap ...
                46068 active+clean
                    9 creating

The pgs belong to the pool '.rgw.buckets'. It stuck on creating for hours.

Also, RGWs are stuck. It only connects to 4 OSD ports and exits with
'initialization timeout' after a restart. However telnetting other OSD
port is okay.

Please help.

Thanks!

On 2015年05月15日 18:17, flisky wrote:
Hi list,

I reformatted some OSDs to increase the journal_size, and just did it in
the hurry, some pgs have lost data and in the incomplete status.

The cluster is stuck in 'creating' status after **ceph osd lost xx** and
**force_create_pg**. I find the dir 'osd-xx/current/xx.xxx_head' only
contains an empty file under the last acting list. and it never create
again after move the dir out.

It caused slow requests made our online rgw service unstable.

Could anyone give me a hint to continue debugging this?

BTW, ceph's version -- 0.94.1

Thanks,

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux