Thanks,
Greg. Following your lead, we discovered the proper 'set_choose_tries xxx’ value had not been applied to *this* pool’s rule, and we updated the cluster accordingly. We then moved a random OSD out and back in to ‘kick’ things, but no joy: we still have
the 4 ‘remapped’ PGs. BTW: the 4 PGs look OK from a basic rule perspective: they’re on different OSDs/on different Hosts, which is what we’re concerned with… but it seems CRUSH has different goals for them and they are inactive.
So..back to the basic question: can we get just the ‘remapped’ PGs to re-sort themselves without causing massive data movement….or is
a complete re-sort the only way to get to a desired CRUSH state?
As for the force_create_pg command: if it creates a blank PG element on a specific OSD (yes?), what happens to an existing PG element on
other OSDs? Could we use force_create_pg followed by a ‘pg repair’ command to get things back to the proper state (in a very targeted way)?
For reference, below is the (reduced) output of dump_stuck:
pg_stat objects mip degr unf bytes log disklog state state_stamp v reported up up_pri
acting acting_pri
11.6e5 284 0 0 0 2366787669 3012 3012 remapped 2015-04-23 13:19:02.373507 68310'49068 78500:123712 [0,92] 0 [0,84] 0 11.8bb 283 0 0 0 2349260884 3001 3001 remapped 2015-04-23 13:19:02.550735 70105'49776 78500:125026 [0,92] 0 [0,88] 0 11.e2f 280 0 0 0 2339844181 3001 3001 remapped 2015-04-23 13:18:59.299589 68310'51082 78500:119555 [77,4] 77 [77,34] 77 11.323 282 0 0 0 2357186647 3001 3001 remapped 2015-04-23 13:18:58.970396 70105'48961 78500:123987 [0,37] 0 [0,19] 0
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com