We have observed a very similar behavior. In a 140 OSD cluster (new created and idle) ~8000 PGs are available. After adding two new pools (each with 20000 PGs) 100 out of 140 OSDs are going down + out. The cluster never recovers. This problem can be reproduced every time with v0.67 and 0.72. With v0.61 this problem does not show up. -Dieter On Thu, Mar 13, 2014 at 10:46:05AM +0100, Gandalf Corvotempesta wrote: > 2014-03-13 9:02 GMT+01:00 Andrey Korolyov <andrey@xxxxxxx>: > > Yes, if you have essentially high amount of commited data in the cluster > > and/or large number of PG(tens of thousands). > > I've increased from 64 to 8192 PGs > > > If you have a room to > > experiment with this transition from scratch you may want to play with > > numbers in the OSD` queues since they causing deadlock-like behaviour on > > operations like increasing PG count or large pool deletion. If cluster > > has no I/O at all at the moment, such behaviour is not expected definitely. > > My cluster was totally idle, it's a test with ceph-ansible repository and nobody > was using it. > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com