Hello all, I have seen this for a long time, but never investigated further. After stable test runs for several days, this is our last known show stopper before using Ceph in production. We are running 0.47.2 on 32 Bit. If we restart MDS (or all ceph daemons) on all nodes, one after another or all together, they first recover and then the active one starts to spin with full cpu and does not answer any more. After a while, the next takes over, starts to spin, etc., until the whole cluster is unusable. This is completely reproducable and happens even without any active client. As ecpected, ceph -w shows lots of "2012-06-15 11:35:28.588775 mds e959: 1/1/1 up {0=3=up:active(laggy or crashed)}" It does not help to stop all services on all nodes for minutes or longer and to restart them - MDS will restart spinning. But: If we reboot the whole cluster, everything goes back to work. Today's MDS log is available at https://download.m-privacy.de/homeuser-mds.0.log.gz Is this a known problem? It has been with us for a looong time now, but since rebooting used to help, we never tracked it down. Amon Ott -- Dr. Amon Ott m-privacy GmbH Tel: +49 30 24342334 Am Köllnischen Park 1 Fax: +49 30 24342336 10179 Berlin http://www.m-privacy.de Amtsgericht Charlottenburg, HRB 84946 Geschäftsführer: Dipl.-Kfm. Holger Maczkowsky, Roman Maczkowsky GnuPG-Key-ID: 0x2DD3A649 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html