mkcephfs questions

john.wilkins@xxxxxxxxxxx (John Wilkins) · Thu, 15 May 2014 17:59:50 -0700

What version of Ceph are you using? mkcephfs was deprecated with the
Cuttlefish release. You can still see the old documentation for mkcephfs
here: http://ceph.com/docs/cuttlefish/start/. However, most people use
ceph-deploy to bootstrap a cluster now.

The output looks like exactly half of the objects are degraded. With two
nodes, that is a clue. I think the problem may be in your network
configuration. Ceph clients, monitors, OSDs and metadata servers talk on
the public (front side) network. All monitors and all OSDs MUST be able to
talk on the public (front side) network. It's how they report their status,
among other things.

Only OSDs talk on the cluster (backside) network. See the diagram and
discussion here:
http://ceph.com/docs/master/rados/configuration/network-config-ref/
The reason for setting up a cluster network is to place heartbeat and
replication traffic on a separate network so that client-to-monitor and
client-to-OSD traffic doesn't experience latency as a result of
replication, rebalancing and recovery traffic. For example, the heartbeat
traffic could be conducted on the cluster (back side) network:
http://ceph.com/docs/master/rados/configuration/mon-osd-interaction/, while
the reporting by the OSDs to the monitors gets done on the public (front
side) network.

If this is your first cluster, I would simply follow the quick start and
use one public network, and then build out the cluster network once you
have a cluster with a public network up and running.
http://ceph.com/docs/master/start/

Looking at your log file, I see at least these problems:

1. It looks like 3 of your OSDs get reported down early in the process.

2014-04-30 03:24:25.222104 mon.0 192.168.0.2:6789/0 81052 : [INF] osd.22
192.168.0.4:6818/33528 boot
2014-04-30 03:24:25.222203 mon.0 192.168.0.2:6789/0 81053 : [INF] osdmap
e3473: 24 osds: 21 up, 24 in

2. The OSDs then tell the monitor that they were wrongly marked down.

2014-04-30 03:24:27.365309 mon.0 192.168.0.2:6789/0 81058 : [INF] osdmap
e3474: 24 osds: 23 up, 24 in
2014-04-30 03:24:29.152186 osd.20 192.168.0.4:6812/33396 254 : [WRN] map
e3474 wrongly marked me down
2014-04-30 03:24:24.676894 osd.22 192.168.0.4:6818/33528 254 : [WRN] map
e3472 wrongly marked me down
2014-04-30 03:24:27.055700 osd.21 192.168.0.4:6815/33462 254 : [WRN] map
e3473 wrongly marked me down

This means that they were up and running, but other OSDs were pinging them
and not getting an answer. That's the heartbeat process.
http://ceph.com/docs/master/rados/configuration/mon-osd-interaction/.

3. They finally all report in as up and running:

2014-04-30 03:29:24.708134 mon.0 192.168.0.2:6789/0 81682 : [INF] osdmap
e3501: 24 osds: 24 up, 24 in
2014-04-30 03:29:24.808498 mon.0 192.168.0.2:6789/0 81683 : [INF] pgmap
v9214: 4800 pgs: 4800 active+degraded; 9470 bytes data, 3245 MB used, 94773
MB / 98019 MB avail; 21/42 objects degraded (50.000%)

4. Then, some OSDs get wrongly marked down again.

On Sun, May 4, 2014 at 12:49 AM, Cao, Buddy <buddy.cao at intel.com> wrote:

> Haomai,
>
> I attached the logs again in the attachment. Sorry that I'm just start to
> learn Ceph, not experienced enough to analyze the logs and find the root
> cause. Please advice.
>
>
>
> Wei Cao (Buddy)
>
> -----Original Message-----
> From: Haomai Wang [mailto:haomaiwang at gmail.com]
> Sent: Wednesday, April 30, 2014 4:58 PM
> To: Cao, Buddy
> Cc: ceph-users at lists.ceph.com
> Subject: Re: mkcephfs questions
>
> OK, actually I just know it. It looks OK.
>
> According to the log, many osds try to boot and repeatedly. I think the
> problem maybe in monitor side. Could you check the monitor node and the
> ceph-mon.log which provided is blank.
>
> On Wed, Apr 30, 2014 at 3:59 PM, Cao, Buddy <buddy.cao at intel.com> wrote:
> > Yes, I set "osd journal size= 0 " by purpose, I'd like to use all of the
> space of journal device, I think I got the idea from Ceph website... Yes, I
> do run " mkcephfs -a -c /etc/ceph/ceph.conf -k /etc/ceph/keyring.admin" to
> start create ceph cluster, and it succeed.
> >
> > Do you think "osd journal size=0" would cause any problems?
> >
> >
> > Wei Cao (Buddy)
> >
> > -----Original Message-----
> > From: Haomai Wang [mailto:haomaiwang at gmail.com]
> > Sent: Wednesday, April 30, 2014 3:48 PM
> > To: Cao, Buddy
> > Cc: ceph-users at lists.ceph.com
> > Subject: Re: mkcephfs questions
> >
> > I found "osd journal size = 0" in your ceph.conf?
> > Do you really run mkcephfs with this? I think it will be fail.
> >
> > On Wed, Apr 30, 2014 at 2:42 PM, Cao, Buddy <buddy.cao at intel.com> wrote:
> >> Here you go... I did not see any stuck clean related log...
> >>
> >>
> >>
> >> Wei Cao (Buddy)
> >>
> >> -----Original Message-----
> >> From: Haomai Wang [mailto:haomaiwang at gmail.com]
> >> Sent: Wednesday, April 30, 2014 2:12 PM
> >> To: Cao, Buddy
> >> Cc: ceph-users at lists.ceph.com
> >> Subject: Re: mkcephfs questions
> >>
> >> Hmm, it should be another problem plays. Maybe more logs could explain
> it.
> >>
> >> ceph.log
> >> ceph-mon.log
> >>
> >> On Wed, Apr 30, 2014 at 12:06 PM, Cao, Buddy <buddy.cao at intel.com>
> wrote:
> >>> Thanks your reply, Haomai. What I don't understand is that, why the
> stuck unclean pgs keep the same numbers after 12 hours. It's the common
> behavior or not?
> >>>
> >>>
> >>> Wei Cao (Buddy)
> >>>
> >>> -----Original Message-----
> >>> From: Haomai Wang [mailto:haomaiwang at gmail.com]
> >>> Sent: Wednesday, April 30, 2014 11:36 AM
> >>> To: Cao, Buddy
> >>> Cc: ceph-users at lists.ceph.com
> >>> Subject: Re: mkcephfs questions
> >>>
> >>> The result of "ceph -s" should tell you the reason. There only
> >>> exists
> >>> 21 OSD up but we need 24 OSDs
> >>>
> >>> On Wed, Apr 30, 2014 at 11:21 AM, Cao, Buddy <buddy.cao at intel.com>
> wrote:
> >>>> Hi,
> >>>>
> >>>>
> >>>>
> >>>> I setup ceph cluster thru mkcephfs command, after I enter ?ceph
> >>>> ?s?, it always returns 4950 stuck unclean pgs. I tried the same ?ceph
> -s?
> >>>> after 12 hrs,  there still returns the same unclean pgs number,
> nothing changed.
> >>>> Does mkcephfs always has the problem or I did something wrong? I
> >>>> attached the result of ?ceph -s?, ?ceph osd tree? and ceph.conf I
> >>>> have, please kindly help.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> [root at ceph]# ceph -s
> >>>>
> >>>>     cluster 99fd4ff8-0fb8-47b9-8179-fefbba1c2503
> >>>>
> >>>>      health HEALTH_WARN 4950 pgs degraded; 4950 pgs stuck unclean;
> >>>> recovery
> >>>> 21/42 objects degraded (50.000%); 3/24 in osds are down; clock skew
> >>>> detected on mon.1, mon.2
> >>>>
> >>>>      monmap e1: 3 mons at
> >>>> {0=192.168.0.2:6789/0,1=192.168.0.3:6789/0,2=192.168.0.4:6789/0},
> >>>> election epoch 6, quorum 0,1,2 0,1,2
> >>>>
> >>>>      mdsmap e4: 1/1/1 up {0=0=up:active}
> >>>>
> >>>>      osdmap e6019: 24 osds: 21 up, 24 in
> >>>>
> >>>>       pgmap v16445: 4950 pgs, 6 pools, 9470 bytes data, 21 objects
> >>>>
> >>>>             4900 MB used, 93118 MB / 98019 MB avail
> >>>>
> >>>>             21/42 objects degraded (50.000%)
> >>>>
> >>>>                 4950 active+degraded
> >>>>
> >>>>
> >>>>
> >>>> [root at ceph]# ceph osd tree //part of returns
> >>>>
> >>>> # id    weight  type name       up/down reweight
> >>>>
> >>>> -36     25      root vsm
> >>>>
> >>>> -31     3.2             storage_group ssd
> >>>>
> >>>> -16     3                       zone zone_a_ssd
> >>>>
> >>>> -1      1                               host vsm2_ssd_zone_a
> >>>>
> >>>> 2       1                                       osd.2   up      1
> >>>>
> >>>> -6      1                               host vsm3_ssd_zone_a
> >>>>
> >>>> 10      1                                       osd.10  up      1
> >>>>
> >>>> -11     1                               host vsm4_ssd_zone_a
> >>>>
> >>>> 18      1                                       osd.18  up      1
> >>>>
> >>>> -21     0.09999                 zone zone_c_ssd
> >>>>
> >>>> -26     0.09999                 zone zone_b_ssd
> >>>>
> >>>> -33     3.2             storage_group sata
> >>>>
> >>>> -18     3                       zone zone_a_sata
> >>>>
> >>>> -3      1                               host vsm2_sata_zone_a
> >>>>
> >>>> 1       1                                       osd.1   up      1
> >>>>
> >>>> -8      1                               host vsm3_sata_zone_a
> >>>>
> >>>> 9       1                                       osd.9   up      1
> >>>>
> >>>> -13     1                               host vsm4_sata_zone_a
> >>>>
> >>>> 17      1                                       osd.17  up      1
> >>>>
> >>>> -23     0.09999                 zone zone_c_sata
> >>>>
> >>>> -28     0.09999                 zone zone_b_sata
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Wei Cao (Buddy)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ceph-users mailing list
> >>>> ceph-users at lists.ceph.com
> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards,
> >>>
> >>> Wheat
> >>
> >>
> >>
> >> --
> >> Best Regards,
> >>
> >> Wheat
> >
> >
> >
> > --
> > Best Regards,
> >
> > Wheat
>
>
>
> --
> Best Regards,
>
> Wheat
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 
John Wilkins
Senior Technical Writer
Intank
john.wilkins at inktank.com
(415) 425-9599
http://inktank.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140515/093cced8/attachment.htm>