Thanks! I tried restarting osd.11 (the primary osd for the incomplete pg) and that helped a LOT. We went from 0/1 op/s to 10-800+ op/s! We still have "HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs stuck unclean", but at least we can use our cluster :-) ceph pg dump_stuck inactive ok pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up acting last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 2.1f6 118 0 0 0 403118080 0 0 incomplete 2013-07-30 06:08:18.883179 11127'11658123 12914'1506 [11,9] [11,9] 10321'11641837 2013-07-28 00:59:09.552640 10321'11641837 Thanks again! Jeff On Tue, Jul 30, 2013 at 11:44:58AM +0200, Jens Kristian S?gaard wrote: > Hi, > >> This is the same issue as yesterday, but I'm still searching for a >> solution. We have a lot of data on the cluster that we need and can't >> health HEALTH_WARN 1 pgs incomplete; 1 pgs stuck inactive; 1 pgs > > I'm not claiming to have an answer, but I have a suggestion you can try. > > Try running "ceph pg dump" to list all the pgs. Grep for ones that are > inactive / incomplete. Note which osds they are on - it is listed in the > square brackets with the primary being the first in the list. > > Now try restarting the primary osd for the stuck pg and see if that > could possible shift things into place. > > -- > Jens Kristian S?gaard, Mermaid Consulting ApS, > jens@xxxxxxxxxxxxxxxxxxxx, > http://www.mermaidconsulting.com/ -- =============================================================================== _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com