Re: MDS spinning wild after restart on all nodes

Sage Weil <sage@xxxxxxxxxxx> · Fri, 15 Jun 2012 09:44:23 -0700 (PDT)

On Fri, 15 Jun 2012, Amon Ott wrote:
> Hello all,
> 
> I have seen this for a long time, but never investigated further. After stable 
> test runs for several days, this is our last known show stopper before using 
> Ceph in production. We are running 0.47.2 on 32 Bit.
> 
> If we restart MDS (or all ceph daemons) on all nodes, one after another or all 
> together, they first recover and then the active one starts to spin with full 
> cpu and does not answer any more. After a while, the next takes over, starts 
> to spin, etc., until the whole cluster is unusable. This is completely 
> reproducable and happens even without any active client.
> 
> As ecpected, ceph -w shows lots of
> "2012-06-15 11:35:28.588775   mds e959: 1/1/1 up {0=3=up:active(laggy or 
> crashed)}"
> 
> It does not help to stop all services on all nodes for minutes or longer and 
> to restart them - MDS will restart spinning. But: If we reboot the whole 
> cluster, everything goes back to work.
> 
> Today's MDS log is available at 
> https://download.m-privacy.de/homeuser-mds.0.log.gz
> 
> Is this a known problem? It has been with us for a looong time now, but since 
> rebooting used to help, we never tracked it down.

I haven't seen this before.  Can you attach to the spinning process with 
gdb and send us a dump of what the threads are doing?  'thread apply all 
bt'.  I opened #2596:

	http://tracker.newdream.net/issues/2596

Thanks!
sage

> 
> Amon Ott
> -- 
> Dr. Amon Ott
> m-privacy GmbH           Tel: +49 30 24342334
> Am Köllnischen Park 1    Fax: +49 30 24342336
> 10179 Berlin             http://www.m-privacy.de
> 
> Amtsgericht Charlottenburg, HRB 84946
> 
> Geschäftsführer:
>  Dipl.-Kfm. Holger Maczkowsky,
>  Roman Maczkowsky
> 
> GnuPG-Key-ID: 0x2DD3A649
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>