On 07/10/2012 06:11 PM, Mark Kirkwood wrote:
I am seeing this: # ceph -s health HEALTH_WARN 256 pgs stale; 256 pgs stuck stale monmap e1: 3 mons at {ved1=192.168.122.11:6789/0,ved2=192.168.122.12:6789/0,ved3=192.168.122.13:6789/0}, election epoch 18, quorum 0,1,2 ved1,ved2,ved3 osdmap e62: 4 osds: 4 up, 4 in pgmap v47148: 768 pgs: 512 active+clean, 256 stale+active+clean; 2224 MB data, 15442 MB used, 86907 MB / 102350 MB avail mdsmap e1: 0/0/1 In particular 256 pgs stuck stale - I've tried a) waiting a while (overnight), b) a rolling restart of all 4 osd's, c) restarting all ceph services on all 4 nodes. All without changing this. As far as I understand what stuck state means, I can't see why they need to stay that way, given all osd's and mon's are up. (I have no mds configured)....any ideas? Or is this just expected? Regards Mark
What does 'ceph pg dump_stuck stale' show? Stale means that the monitors haven't gotten updates about those pgs from the osds within the a certain period of time (default is 300 seconds), so something may be wrong with your crushmap or those pgs themselves. Josh -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html