On Wed, 27 Apr 2011, doki74216@xxxxxxxxx wrote: > Dear developers > > I am testing the reliability of MDS in Ceph File System. > As we know, the default setting, there are one active MDS and one standby MDS. > I want to test the reliability of MDS. > Here is my testing scenario: > As easy to understand, here I assume the active MDS as mds0 and the > standby one is mds1. > I write data and stop the mds0 daemon by "ceph mds stop 0" at the same time. > Will the standby mds1 change its status to active? > I think the system should be normal and the data should not loss even > though there is just one MDS. > But there are many problem I met....>< > > (1) I want to know which mds is active and standby. > But there are different answer when I type different commands. > > 1. When I type "ceph mds stat". > It shows: (0=up: active), 1 up: standby > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Does it means mds0 Right, the 0 means mds0. > is active and mds1 is standby ? > 2.When I type "ceph mds dump -o -" > It shows: mds0 is standby and mds1 is active. > > My question is: Why there are different status about mds0 and mds1? > Which one is correct? It sounds like you named the MDS's with numeric identifiers. You should use non-numeric names to avoid this confusion, like mds.a and mds.b. The numeric role/rank (mds0, mds1) is assigned to cmds instances dynamically based on who is up and needed in which role at the time. > (2) I want to know the issue about stoping the active mds. > If the command which "ceph mds stat" can show the active mds correctly, > the active mds must be mds0 and the other mds1 is standby. > So I type"ceph mds stop 0". > It shows: ?telling mds0 192.168.200.185:6800/14819 to stop?(0) > ^^^^^^^^^^^^^^^^^^ mds1's IP. > 1. Why it shows the system stoping mds0 but mds1's IP ? The 'stop' command takes the dynamic role, not the name. Hope this clears it up! sage > > 2. When I type "ceph -w", here is the log: > ====================================== > mds e43: 1/1/1 up {0=up:active}, 1 up standby > mds e44: 1/1/1 up {0=up:stoping}, 1 up standby > mds e45: 1/1/1 up {0=up:replay} > mds e46: 1/1/1 up {0=up:reconnect} > mds e47: 1/1/1 up {0=up:rejoin} > mds e44: 1/1/1 up {0=up:active} > ===================================== > >From the log, which MDS does the system stop? > My command which is"ceph mds stop 0", but the log should not be {1=up:active}? > It really confuse me... > > 3. When I type"ceph mds dump ?o ?", here is the log: > 4920: 192.168.200.184:6800/30000 ?0? mds0.12 up:active seq 40828 > ^^^^^^^^^^^^^^^^^^^ mds0's IP > Why does the system leave the mds0? > > Please help me solving these problem, I am very confused... > Thanks a lot~~~^^ > > > By the way, this is my testing environment : > ================================================================================== > I set 7 servers which include 3 MONs(host1 host2 host3), 2 MDSs(host4 > host5) and 2 OSDes(host6 host7). > The version ceph 0.26 is in my system. > ================================================================================== > -- > Best Regards, > Stefanie Chen > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html