On 10/22/2014 07:41 PM, Andrey Korolyov wrote:
Hello, given small test cluster, following sequence resulted to the inability to join back for freshly formatted OSD: - update cluster sequentially from cuttlefish to dumpling to firefly, - execute tunables change, wait for recovery completion, - shut down single osd, reformat filestore and journal, - start it back (auth caps and key remained the same). Version is 5a10b95f7968ecac1f2af4abf9fb91347a290544. Any ideas why this may happen are very welcomed. I suspect some resource starting from 29499 (probably earlier but this line doing a clear separation between init stage and loop in the log) line in strace which is continuously asking for resource all way down may be a root cause (something just after journal and collections initialization) but I have no idea what it may be. Thanks! Strace http://xdel.ru/downloads/osd0.out.gz
Can you send us your ceph.conf (edit away any sensitive information you may have), the log for the osd you are having trouble with (with 'debug monc = 10' and 'debug ms = 1'), and the log for your monitors (with 'debug mon = 10', 'debug ms = 1')?
-Joao -- Joao Eduardo Luis Software Engineer | http://inktank.com | http://ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com