Re: Bug #1047 reproduced

Amon Ott <a.ott@xxxxxxxxxxxx> · Fri, 27 Jan 2012 15:23:52 +0100

On Thursday 29 December 2011 wrote Amon Ott:
> I finally got the test cluster freed up for more Ceph testing.
>
> On Friday 23 December 2011 wrote Gregory Farnum:
> > Unfortunately there's not enough info in this log either. If you can
> > reproduce it with "mds debug = 20" and put that log somewhere, it
> > ought to be enough to work out what's going on, though. Sorry. :(
> > -Greg
>
> Here is what MDS logs with debug 20. No idea if it really helps. The
> cluster is still in the broken state, should I try to reproduce with a
> recreated ceph fs and debug 20? This could be GBs of logs.

Update: I recreated the Ceph FS with release 0.40. It broke only because of a 
btrfs bug hitting two of the four nodes (after ca. one day of heavy load) and 
recovered without problems when the nodes were back. Then I recreated with 
ext4 as osd storage area and have not managed to break it within four days, 
two of these under heavy load.

This means that this bug is probably fixed. It might be related to the 
automatic reconnect of mds, which avoids meta data inconsistencies. :)

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH           Tel: +49 30 24342334
Am Köllnischen Park 1    Fax: +49 30 24342336
10179 Berlin             http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html