Re: How to migrate from a "missing auth" monitor files to a regular one?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On Sun, Aug 25, 2013 at 10:27 PM, Joao Eduardo Luis <joao.luis@xxxxxxxxxxx> wrote:
On 08/25/2013 12:36 PM, Yu Changyuan wrote:
Today, when I restart ceph service, the problem I asked on mail-list
before happened
again(http://article.gmane.org/gmane.comp.file-systems.ceph.user/2995),
ceph-mon refuse to start and report below error:

2013-08-25 18:24:52.465600 7fb50a496780 -1 mon/AuthMonitor.cc: In
function 'virtual void AuthMonitor::update_from_paxos(bool*)' thread
7fb50a496780 time 2013-08-25 18:24:52.453920
mon/AuthMonitor.cc: 152: FAILED assert(ret == 0)

  ceph version 0.61.7 (8f010aff684e820ecc837c25ac77c7a05d7191ff)
  1: (AuthMonitor::update_from_paxos(bool*)+0x1fee) [0x57742e]
  2: (PaxosService::refresh(bool*)+0x18d) [0x4f630d]
  3: (Monitor::refresh_from_paxos(bool*)+0x57) [0x496477]
  4: (Monitor::init_paxos()+0xf5) [0x496635]
  5: (Monitor::preinit()+0x6bc) [0x4ad1dc]
  6: (main()+0x1bec) [0x48ac8c]
  7: (__libc_start_main()+0xed) [0x7fb5084c660d]
  8: ceph-mon() [0x48dab9]

Then, I switch to ''wip-mon-skip-auth-cuttlefish" branch, ceph-mon
complain some "missing auth inc"(from 1 to 500), and continue running,
then everything is ok again.

But when I stop this patched ceph-mon, and try to start regular
unpatched ceph-mon, above error happened again. As I mentioned, the
ceph-mon files last time I use is not the final one that 'missing auth',
but the files 2 days before ceph-mon fail, which actually ceph-mon start
ok but ceph-osd refuse to work.

So, I want to know how to make these ceph-mon files that only work with
patched ceph-mon to work again withexcept OSError, e regular unpatched ceph-mon.


Changyuan,

Would you mind sending us your monitor store?  If you have other monitors, specially if this doesn't happen on them, the other monitor's stores would also be insightful.
OK, I have sent the monitor's store to you.
Furthermore, what's your cluster history?  At what version was it first deployed, and what versions have you upgraded it to until reaching 0.61.7?
This is the full history of my cluster:
1. My cluster first deploy on version 0.61.1
2. and when ceph-mon refuse to start after a reboot, I directly upgrade to 0.61.7, and make the cluster work again with patched ceph-mon and monitor's store 2 days before ceph-mon not work.
3. then I stop restart cluster with regular ceph-mon(and works).
4. I restart cluster cluster and find ceph-mon not start again 3 days ago, so I try patched ceph-mon and it works, but this time I do not  restart cluster with a regular ceph-mon.
5. then I try to add another monitor(mon.b) yesterday, after mon.b join the cluster, the ceph-mon which is unpatched running on the new host throw the same exception from "AuthMonitor::update_from_paxos", and stopped.
6. I have to stop cluster and manually remove the never start again mon.b from cluster(I don't have patched version on new host), and make the cluster running a single mon.a with patched ceph-mon again.

  -Joao

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Best regards,
Changyuan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux