Re: OSD wrongly marked up (well, half up)

Wilfrid Allembrand <wilfrid.allembrand@xxxxxxxxx> · Sun, 12 Jun 2011 14:21:25 +0200

Thank you very much Colin !

I was trying to check the different variable set on OSD, MDS and MON
but realized that we can't dump the conf of a MON.

 ceph osd dump -o - and  ceph mds dump -o - works fine to see
different variables but it's not working for MONs.
Do we have to use a different syntax to dump out the mon conf or is it
planned to port the dump feature for mon as well ?

Regards,
Wilfrid

2011/6/12 Colin McCabe <cmccabe@xxxxxxxxxxxxxx>:
> On Sat, Jun 11, 2011 at 2:08 PM, Wilfrid Allembrand
> <wilfrid.allembrand@xxxxxxxxx> wrote:
>> Hi all,
>>
>> On my test cluster I have 3 MON, 2 MDS and 2 OSD. I'm doing some
>> failover test on OSD and got a strange thing on the status.
>> The 2 nodes hosting the OSDs have been shutdown but the status continu
>> to 'see' one alive :
>
> Hi Wilfrid,
>
> Usually OSDMaps are propagated peer-to-peer amongst the OSDs. This
> means that OSDs that go down are rapidly detected. However, when all
> OSDs go down, there are no more OSDs to send OSDmaps. In this case, we
> rely on a timeout in the monitor to determine that all the OSDs are
> down.
>
> After mon_osd_report_timeout seconds elapse without an osdmap being
> sent from an OSD, the monitor marks it down. The default is 900
> seconds or 15 minutes. So once you wait for 15 minutes, all the OSDs
> should be marked as down.
>
> sincerely,
> Colin
>
>
>>
>> # ceph -v
>> ceph version 0.29 (commit:8e69c39f69936e2912a887247c6e268d1c9059ed)
>> # uname -a
>> Linux test2 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC
>> 2011 x86_64 x86_64 x86_64 GNU/Linux
>>
>> root@test2:~# ceph health
>> 2011-06-11 17:03:38.492734 mon <- [health]
>> 2011-06-11 17:03:38.493913 mon1 -> 'HEALTH_WARN 594 pgs degraded,
>> 551/1102 degraded (50.000%); 1/2 osds down, 1/2 osds out' (0)
>>
>> root@test2:~# ceph osd stat
>> 2011-06-11 17:03:48.071885 mon <- [osd,stat]
>> 2011-06-11 17:03:48.073290 mon1 -> 'e31: 2 osds: 1 up, 1 in' (0)
>>
>> root@test2:~# ceph mds stat
>> 2011-06-11 17:03:54.868986 mon <- [mds,stat]
>> 2011-06-11 17:03:54.870418 mon1 -> 'e48: 1/1/1 up {0=test4=up:active},
>> 1 up:standby' (0)
>>
>> root@test2:~# ceph mon stat
>> 2011-06-11 17:04:09.638549 mon <- [mon,stat]
>> 2011-06-11 17:04:09.639994 mon0 -> 'e1: 3 mons at
>> {0=10.1.56.231:6789/0,1=10.1.56.232:6789/0,2=10.1.56.233:6789/0},
>> election epoch 508, quorum 0,1,2' (0)
>>
>> How could it be, is it a bug ?
>> (be sure I triple checked that my 2 osd nodes are really shutdown)
>>
>> Thanks !
>> Wilfrid
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html