Re: MDS Replay Issues

Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> · Fri, 1 Jul 2011 09:39:35 -0700



On Fri, Jul 1, 2011 at 9:13 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> There are two parts here:
>
>  - 'ceph mds stop <num>' will tell the given mds rank to export its
> subtrees and leave the active set.  The daemon will either shut down or go
> back to standby (I forget which :).
>
>  - Setting max_mds to a lower value will prevent any new or standby MDSs
> from (re)joining the active set.
>
> The first part isn't yet part of our testing matrix but should work!
That which is not tested does not exist. :)

gregf@kai:~/ceph/src$ ./ceph mds stop 1
2011-07-01 09:14:30.787075 mon <- [mds,stop,1]
2011-07-01 09:14:30.949816 mon0 -> 'telling mds1 10.0.1.205:6805/4545
to stop' (0)
gregf@kai:~/ceph/src$ ls core*
core.cmds.4632
gregf@kai:~/ceph/src$ ./ceph set_max_mds 1
2011-07-01 09:15:57.478978 mon <- [set_max_mds,1]
2011-07-01 09:15:57.479593 mon0 -> 'unrecognized subsystem' (-22)
gregf@kai:~/ceph/src$ ./ceph mds set_max_mds 1
2011-07-01 09:16:20.243239 mon <- [mds,set_max_mds,1]
2011-07-01 09:16:20.432809 mon0 -> 'max_mds = 1' (0)
-----------------------------------------

2011-07-01 09:14:18.615247    pg v13: 18 pgs: 18
active+clean+degraded; 43 KB data, 38643 MB used, 781 GB / 863 GB
avail; 37/74 degraded (50.000%)
2011-07-01 09:14:18.615427   mds e8: 2/2/2 up {0=a=up:active,1=b=up:active}
2011-07-01 09:14:18.615461   osd e2: 1 osds: 1 up, 1 in
2011-07-01 09:14:18.615516   log 2011-07-01 09:11:39.639307 osd0
10.0.1.205:6800/4471 18 : [INF] 2.1p0 scrub ok
2011-07-01 09:14:18.615587   mon e1: 1 mons at {a=10.0.1.205:6789/0}
2011-07-01 09:14:30.949841   mds e9: 2/2/2 up {0=a=up:active,1=b=up:stopping}
2011-07-01 09:14:31.201617   mds e10: 1/1/2 up {0=a=up:active}
2011-07-01 09:14:31.503782   log 2011-07-01 09:14:31.201437 mon0
10.0.1.205:6789/0 7 : [INF] mds1 10.0.1.205:6805/4545 down:stopped
2011-07-01 09:14:32.913620    pg v14: 18 pgs: 18
active+clean+degraded; 43 KB data, 38643 MB used, 781 GB / 863 GB
avail; 37/74 degraded (50.000%)
2011-07-01 09:14:35.522757   mds e11: 1/1/2 up {0=a=up:active}, 1 up:standby
2011-07-01 09:14:35.522790   mds e12: 2/2/2 up {0=a=up:active,1=b=up:starting}
2011-07-01 09:14:36.359415   log 2011-07-01 09:14:35.522685 mon0
10.0.1.205:6789/0 8 : [INF] mds? 10.0.1.205:6806/4546 up:boot
2011-07-01 09:14:55.346502   mds e13: 2/2/2 up
{0=a=up:active,1=b=up:starting(laggy or crashed)}


If you've done anything in the tree it doesn't seem to even get that
far, it just gets stuck in the stopping state.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html