Trying to remove one of the folders made the mds.a and mds.b to stop. so somting is wrong in my mds. ceph -s gives 2012-06-06 06:19:19.899973 pg v1220573: 1152 pgs: 1152 active+clean; 191 GB data, 393 GB used, 973 GB / 1379 GB avail 2012-06-06 06:19:19.905097 mds e78: 1/1/1 up {0=c=up:active} 2012-06-06 06:19:19.905200 osd e1114: 8 osds: 8 up, 8 in 2012-06-06 06:19:19.905400 log 2012-06-06 05:51:31.499366 osd.3 10.0.6.11:6804/2933 804 : [INF] 0.c scrub ok 2012-06-06 06:19:19.905598 mon e1: 3 mons at {a=10.0.6.10:6789/0,b=10.0.6.11:6789/0,c=10.0.6.12:6789/0} i checked the log files on ceph1 and 2 where I have my mon. mds.a ------------------- cessful recovery! -2> 2012-06-06 05:38:35.956195 7f2d5ea08700 1 mds.0.12 active_start -1> 2012-06-06 05:38:35.967760 7f2d5ea08700 1 mds.0.12 cluster recovered. 0> 2012-06-06 05:38:37.200297 7f2d5ea08700 -1 mds/AnchorServer.cc: In function 'virtual void AnchorServer::handle_query(MMDSTableRequest*)' thread 7f2d5ea08700 time 2012-06-06 05:38:37.198981 mds/AnchorServer.cc: 249: FAILED assert(anchor_map.count(curino) == 1) ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1) 1: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6bdc95] 2: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b0474] 3: (MDS::_dispatch(Message*)+0xaf8) [0x4c50b8] 4: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b] 5: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49] 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed] 7: (()+0x68ca) [0x7f2d6346e8ca] 8: (clone()+0x6d) [0x7f2d61cf692d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- end dump of recent events --- 2012-06-06 05:38:37.203277 7f2d5ea08700 -1 *** Caught signal (Aborted) ** in thread 7f2d5ea08700 ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1) 1: /usr/bin/ceph-mds() [0x814279] 2: (()+0xeff0) [0x7f2d63476ff0] 3: (gsignal()+0x35) [0x7f2d61c591b5] 4: (abort()+0x180) [0x7f2d61c5bfc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2d624eddc5] 6: (()+0xcb166) [0x7f2d624ec166] 7: (()+0xcb193) [0x7f2d624ec193] 8: (()+0xcb28e) [0x7f2d624ec28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x74f9b0] 10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6bdc95] 11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b0474] 12: (MDS::_dispatch(Message*)+0xaf8) [0x4c50b8] 13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b] 14: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49] 15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed] 16: (()+0x68ca) [0x7f2d6346e8ca] 17: (clone()+0x6d) [0x7f2d61cf692d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2012-06-06 05:38:37.203277 7f2d5ea08700 -1 *** Caught signal (Aborted) ** in thread 7f2d5ea08700 ceph version 0.46 (commit:cb7f1c9c7520848b0899b26440ac34a8acea58d1) 1: /usr/bin/ceph-mds() [0x814279] 2: (()+0xeff0) [0x7f2d63476ff0] 3: (gsignal()+0x35) [0x7f2d61c591b5] 4: (abort()+0x180) [0x7f2d61c5bfc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f2d624eddc5] 6: (()+0xcb166) [0x7f2d624ec166] 7: (()+0xcb193) [0x7f2d624ec193] 8: (()+0xcb28e) [0x7f2d624ec28e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x74f9b0] 10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6bdc95] 11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b0474] 12: (MDS::_dispatch(Message*)+0xaf8) [0x4c50b8] 13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c628b] 14: (SimpleMessenger::dispatch_entry()+0x979) [0x7acb49] 15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7336ed] 16: (()+0x68ca) [0x7f2d6346e8ca] 17: (clone()+0x6d) [0x7f2d61cf692d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- end dump of recent events --- the ceph -v reports on my diffrent servers root@ceph1:~# ceph -v ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) root@ceph1:~# ssh ceph2 ceph -v ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) root@ceph1:~# ssh ceph3 ceph -v ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) root@ceph1:~# ssh ceph4 ceph -v ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) is the 0.46 above reporting when the error occurred or am I running the wrong binaries i use the debian packages ? mds.b 0> 2012-06-06 05:38:17.533743 7fae49945700 -1 mds/AnchorServer.cc: In function 'virtual void AnchorServer::handle_query(MMDSTableRequest*)' thread 7fae49945700 time 2012-06-06 05:38:17.523498 mds/AnchorServer.cc: 249: FAILED assert(anchor_map.count(curino) == 1) ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) 1: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6c1125] 2: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b1984] 3: (MDS::_dispatch(Message*)+0xafa) [0x4c61da] 4: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c73ab] 5: (SimpleMessenger::dispatch_entry()+0x979) [0x7b4729] 6: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7365cd] 7: (()+0x68ca) [0x7fae4e3ab8ca] 8: (clone()+0x6d) [0x7fae4cc3392d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- end dump of recent events --- 2012-06-06 05:38:17.711889 7fae49945700 -1 *** Caught signal (Aborted) ** in thread 7fae49945700 ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) 1: /usr/bin/ceph-mds() [0x81da89] 2: (()+0xeff0) [0x7fae4e3b3ff0] 3: (gsignal()+0x35) [0x7fae4cb961b5] 4: (abort()+0x180) [0x7fae4cb98fc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fae4d42adc5] 6: (()+0xcb166) [0x7fae4d429166] 7: (()+0xcb193) [0x7fae4d429193] 8: (()+0xcb28e) [0x7fae4d42928e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x7555f0] 10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6c1125] 11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b1984] 12: (MDS::_dispatch(Message*)+0xafa) [0x4c61da] 13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c73ab] 14: (SimpleMessenger::dispatch_entry()+0x979) [0x7b4729] 15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7365cd] 16: (()+0x68ca) [0x7fae4e3ab8ca] 17: (clone()+0x6d) [0x7fae4cc3392d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- 0> 2012-06-06 05:38:17.711889 7fae49945700 -1 *** Caught signal (Aborted) ** in thread 7fae49945700 ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) 1: /usr/bin/ceph-mds() [0x81da89] 2: (()+0xeff0) [0x7fae4e3b3ff0] 3: (gsignal()+0x35) [0x7fae4cb961b5] 4: (abort()+0x180) [0x7fae4cb98fc0] 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7fae4d42adc5] 6: (()+0xcb166) [0x7fae4d429166] 7: (()+0xcb193) [0x7fae4d429193] 8: (()+0xcb28e) [0x7fae4d42928e] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x940) [0x7555f0] 10: (AnchorServer::handle_query(MMDSTableRequest*)+0x175) [0x6c1125] 11: (MDS::handle_deferrable_message(Message*)+0xd84) [0x4b1984] 12: (MDS::_dispatch(Message*)+0xafa) [0x4c61da] 13: (MDS::ms_dispatch(Message*)+0x1fb) [0x4c73ab] 14: (SimpleMessenger::dispatch_entry()+0x979) [0x7b4729] 15: (SimpleMessenger::DispatchThread::entry()+0xd) [0x7365cd] 16: (()+0x68ca) [0x7fae4e3ab8ca] 17: (clone()+0x6d) [0x7fae4cc3392d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- end dump of recent events --- > For future reference, that error was because the active MDS server was in replay. I can't tell why it didn't move on to active from what you posted, but I imagine it just got a little stuck since restarting made it work out. > -Greg > > > On Tuesday, June 5, 2012 at 1:05 PM, Martin Wilderoth wrote: > > > Hello Again, > > > > I restarted the mds on all servers and then it worked again > > > > /Regards Martin > > > > > Hello > > > > > > > Hi Martin, > > > > > > > > On 06/05/2012 08:07 PM, Martin Wilderoth wrote: > > > > > Hello > > > > > > > > > > Is there a way to recover this error. > > > > > > > > > > mount -t ceph 10.0.6.10:/ /mnt -vv -o name=admin,secret=XXXXXXXXXXXXXXXXXXXXXXX > > > > > [ 506.640433] libceph: loaded (mon/osd proto 15/24, osdmap 5/6 5/6) > > > > > [ 506.650594] ceph: loaded (mds proto 32) > > > > > [ 506.652353] libceph: client0 fsid a9d5f9e1-4bb9-4fab-b79b-ba4457631b01 > > > > > [ 506.670876] Intel AES-NI instructions are not detected. > > > > > [ 506.678861] libceph: mon0 10.0.6.10:6789 session established > > > > > mount: 10.0.6.10:/: can't read superblock > > > > > > > > > > > > > > > > Could you share some more information? For example the output from: ceph -s > > > > > > 2012-06-05 20:25:05.307914 pg v1189604: 1152 pgs: 1152 active+clean; 191 GB data, 393 GB used, 973 GB / 1379 GB > avail > > > 012-06-05 20:25:05.315871 mds e60: 1/1/1 up {0=c=up:replay}, 2 up:standby > > > 2012-06-05 20:25:05.315965 osd e1106: 8 osds: 8 up, 8 in > > > 2012-06-05 20:25:05.316165 log 2012-06-05 20:24:50.425527 mon.0 10.0.6.10:6789/0 75 : [INF] mds.? >10.0.6.11:6800/22974 up:boot > > > 2012-06-05 20:25:05.316371 mon e1: 3 mons at {a=10.0.6.10:6789/0,b=10.0.6.11:6789/0,c=10.0.6.12:6789/0} > > > > > > > > > > > > > > Did you change anything to the cluster since it worked? And what version > > > > are you running? > > > > > > > > > > > > I have not done any changes installed at version 0.46 upgraded earlier and have been testing with > > > ceph and ceph-fuse and backuppc. It was during the ceph-fuse it hanged. > > > > > > Current version > > > ceph version 0.47.2 (commit:8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372) > > > > > > > > One of my mds logs has 24G of data. > > > > > > > > Is it still running? > > > I have restarted mds.a and mds.b they seems to be running. But not everything. > > > mds.a was stoped not sure mds.b but it has a big logfile. > > > > > > > > > > > > > > > > > I have some rbd devices that I would like to keep. > > > > > > > > RBD doesn't use the MDS nor the POSIX filesystem, so you will probably > > > > be fine, but we need the output of "ceph -s" first. > > > > > > > > Does this work? > > > > $ rbd ls > > > > > > > > > this works I'm still using the rbd with no problem > > > > $ rados -p rbd ls > > > > > > > > > seems to work reports something simmilar to > > > rb.0.2.00000000052e > > > rb.0.0.0000000002f2 > > > rb.0.7.000000000345 > > > rb.0.7.000000000896 > > > rb.0.0.000000000102 > > > rb.0.9.000000000172 > > > rb.0.1.000000000350 > > > rb.0.4.000000000180 > > > rb.0.4.00000000068b > > > rb.0.5.00000000054c > > > rb.0.2.0000000001e1 > > > > > > > Wido > > > > > > > > > > > > > > /Regards Martin Regards / Med Vänlig Hälsning Martin Wilderoth VD Linserv AB Enhagsslingan 4A SE-187 40 Täby www.linserv.se Tel: +46(0)8-473 60 63 Fax: +46(0)70-969 09 19 Email: martin.wilderoth@xxxxxxxxxx , Regards / Med Vänlig Hälsning Martin Wilderoth VD Linserv AB Enhagsslingan 4A SE-187 40 Täby www.linserv.se Tel: +46(0)8-473 60 63 Fax: +46(0)70-969 09 19 Email: martin.wilderoth@xxxxxxxxxx , -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html