Hello, I am running ceph v0.56 and at the moment trying to recover ceph which got completely stuck after 1 osd got filled by 95%. Looks like the distribution algorithm is not perfect since all 3 OSD's I user are 256Gb each, however one of them got filled faster than others: osd-1: Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-osd 252G 173G 80G 69% /var/lib/ceph/osd/ceph-0 osd-2: Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-osd 252G 203G 50G 81% /var/lib/ceph/osd/ceph-1 osd-3: Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg00-osd 252G 240G 13G 96% /var/lib/ceph/osd/ceph-2 by the moment mds is showing the following behaviour: 2013-01-08 16:25:47.006354 b4a73b70 0 mds.0.objecter FULL, paused modify 0x9ba63c0 tid 23448 2013-01-08 16:26:47.005211 b4a73b70 0 mds.0.objecter FULL, paused modify 0xca86c30 tid 23449 so, it does not respond to any mount requests I've played around with all types of commands like: ceph mon tell \* injectargs '--mon-osd-full-ratio 98' ceph mon tell \* injectargs '--mon-osd-full-ratio 0.98' and 'mon osd full ratio = 0.98' in mon configuration for each mon however chef@ceph-node03:/var/log/ceph$ ceph health detail HEALTH_ERR 1 full osd(s) osd.2 is full at 95% mds still believes 95% is the threshold, so no responses to mount requests. chef@ceph-node03:/var/log/ceph$ rados -p data bench 10 write Maintaining 16 concurrent writes of 4194304 bytes for at least 10 seconds. Object prefix: benchmark_data_ceph-node03_3903 2013-01-08 16:33:02.363206 b6be3710 0 client.9958.objecter FULL, paused modify 0xa467ff0 tid 1 2013-01-08 16:33:02.363618 b6be3710 0 client.9958.objecter FULL, paused modify 0xa468780 tid 2 2013-01-08 16:33:02.363741 b6be3710 0 client.9958.objecter FULL, paused modify 0xa468f88 tid 3 2013-01-08 16:33:02.364056 b6be3710 0 client.9958.objecter FULL, paused modify 0xa469348 tid 4 2013-01-08 16:33:02.364171 b6be3710 0 client.9958.objecter FULL, paused modify 0xa469708 tid 5 2013-01-08 16:33:02.365024 b6be3710 0 client.9958.objecter FULL, paused modify 0xa469ac8 tid 6 2013-01-08 16:33:02.365187 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46a2d0 tid 7 2013-01-08 16:33:02.365296 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46a690 tid 8 2013-01-08 16:33:02.365402 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46aa50 tid 9 2013-01-08 16:33:02.365508 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46ae10 tid 10 2013-01-08 16:33:02.365635 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46b1d0 tid 11 2013-01-08 16:33:02.365742 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46b590 tid 12 2013-01-08 16:33:02.365868 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46b950 tid 13 2013-01-08 16:33:02.365975 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46bd10 tid 14 2013-01-08 16:33:02.366096 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46c0d0 tid 15 2013-01-08 16:33:02.366203 b6be3710 0 client.9958.objecter FULL, paused modify 0xa46c490 tid 16 sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 0 16 16 0 0 0 - 0 1 16 16 0 0 0 - 0 2 16 16 0 0 0 - 0 rados doesn't work. chef@ceph-node03:/var/log/ceph$ ceph osd reweight-by-utilization no change: average_util: 0.812678, overload_util: 0.975214. overloaded osds: (none) this one also. is there any chance to recover ceph? -- ...WBR, Roman Hlynovskiy -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html