Amon, I've been going through my backlog of flagged emails and came across this one. Did you ever get that information for the bug that you were going to try and find? -Greg On Fri, Jun 15, 2012 at 9:44 AM, Sage Weil <sage@xxxxxxxxxxx> wrote: > On Fri, 15 Jun 2012, Amon Ott wrote: >> Hello all, >> >> I have seen this for a long time, but never investigated further. After stable >> test runs for several days, this is our last known show stopper before using >> Ceph in production. We are running 0.47.2 on 32 Bit. >> >> If we restart MDS (or all ceph daemons) on all nodes, one after another or all >> together, they first recover and then the active one starts to spin with full >> cpu and does not answer any more. After a while, the next takes over, starts >> to spin, etc., until the whole cluster is unusable. This is completely >> reproducable and happens even without any active client. >> >> As ecpected, ceph -w shows lots of >> "2012-06-15 11:35:28.588775 mds e959: 1/1/1 up {0=3=up:active(laggy or >> crashed)}" >> >> It does not help to stop all services on all nodes for minutes or longer and >> to restart them - MDS will restart spinning. But: If we reboot the whole >> cluster, everything goes back to work. >> >> Today's MDS log is available at >> https://download.m-privacy.de/homeuser-mds.0.log.gz >> >> Is this a known problem? It has been with us for a looong time now, but since >> rebooting used to help, we never tracked it down. > > I haven't seen this before. Can you attach to the spinning process with > gdb and send us a dump of what the threads are doing? 'thread apply all > bt'. I opened #2596: > > http://tracker.newdream.net/issues/2596 > > Thanks! > sage > > > > >> >> Amon Ott >> -- >> Dr. Amon Ott >> m-privacy GmbH Tel: +49 30 24342334 >> Am Köllnischen Park 1 Fax: +49 30 24342336 >> 10179 Berlin http://www.m-privacy.de >> >> Amtsgericht Charlottenburg, HRB 84946 >> >> Geschäftsführer: >> Dipl.-Kfm. Holger Maczkowsky, >> Roman Maczkowsky >> >> GnuPG-Key-ID: 0x2DD3A649 >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html