Re: OSD wrongly marked up (well, half up)

Colin McCabe <cmccabe@xxxxxxxxxxxxxx> · Sat, 11 Jun 2011 21:20:38 -0700

On Sat, Jun 11, 2011 at 2:08 PM, Wilfrid Allembrand
<wilfrid.allembrand@xxxxxxxxx> wrote:
> Hi all,
>
> On my test cluster I have 3 MON, 2 MDS and 2 OSD. I'm doing some
> failover test on OSD and got a strange thing on the status.
> The 2 nodes hosting the OSDs have been shutdown but the status continu
> to 'see' one alive :

Hi Wilfrid,

Usually OSDMaps are propagated peer-to-peer amongst the OSDs. This
means that OSDs that go down are rapidly detected. However, when all
OSDs go down, there are no more OSDs to send OSDmaps. In this case, we
rely on a timeout in the monitor to determine that all the OSDs are
down.

After mon_osd_report_timeout seconds elapse without an osdmap being
sent from an OSD, the monitor marks it down. The default is 900
seconds or 15 minutes. So once you wait for 15 minutes, all the OSDs
should be marked as down.

sincerely,
Colin

>
> # ceph -v
> ceph version 0.29 (commit:8e69c39f69936e2912a887247c6e268d1c9059ed)
> # uname -a
> Linux test2 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC
> 2011 x86_64 x86_64 x86_64 GNU/Linux
>
> root@test2:~# ceph health
> 2011-06-11 17:03:38.492734 mon <- [health]
> 2011-06-11 17:03:38.493913 mon1 -> 'HEALTH_WARN 594 pgs degraded,
> 551/1102 degraded (50.000%); 1/2 osds down, 1/2 osds out' (0)
>
> root@test2:~# ceph osd stat
> 2011-06-11 17:03:48.071885 mon <- [osd,stat]
> 2011-06-11 17:03:48.073290 mon1 -> 'e31: 2 osds: 1 up, 1 in' (0)
>
> root@test2:~# ceph mds stat
> 2011-06-11 17:03:54.868986 mon <- [mds,stat]
> 2011-06-11 17:03:54.870418 mon1 -> 'e48: 1/1/1 up {0=test4=up:active},
> 1 up:standby' (0)
>
> root@test2:~# ceph mon stat
> 2011-06-11 17:04:09.638549 mon <- [mon,stat]
> 2011-06-11 17:04:09.639994 mon0 -> 'e1: 3 mons at
> {0=10.1.56.231:6789/0,1=10.1.56.232:6789/0,2=10.1.56.233:6789/0},
> election epoch 508, quorum 0,1,2' (0)
>
> How could it be, is it a bug ?
> (be sure I triple checked that my 2 osd nodes are really shutdown)
>
> Thanks !
> Wilfrid
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html