Ceph on RHEL 7 with multiple OSD's

marco@xxxxxxxxx (Marco Garcês) · Tue, 9 Sep 2014 15:12:05 +0200

Actually in EL7, iptables does not come installed by default, they use
firewalld... just remove firewalld and install iptables, and you are back
in the game! Or learn firewalld, that will work to! :)

*Marco Garc?s*
*#sysadmin*
Maputo - Mozambique
*[Phone]* +258 84 4105579
*[Skype]* marcogarces

On Tue, Sep 9, 2014 at 3:10 PM, Michal Kozanecki <mkozanecki at evertz.com>
wrote:

> Network issue maybe? Have you checked your firewall settings? Iptables
> changed a bit in EL7 and might of broken any rules your normally try and
> use, try flushing the rules (iptables -F) and see if that fixes things, if
> you then you'll need to fix your firewall rules.
>
> I ran into a similar issue on EL7 where the OSD's appeared up and in, but
> were stuck in peering which was due to a few ports being blocked.
>
> Cheers
>
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces at lists.ceph.com] On Behalf Of
> BG
> Sent: September-09-14 6:05 AM
> To: ceph-users at lists.ceph.com
> Subject: Re: Ceph on RHEL 7 with multiple OSD's
>
> Loic Dachary <loic at ...> writes:
>
> >
> > Hi,
> >
> > It it looks like your osd.0 is down and you only have one osd left
> > (osd.1) which would explain why the cluster cannot get to a healthy
> > state. The "size 2" in  "pool 0 'data' replicated size 2 ..." means
> > the pool needs at least two OSDs up to function properly. Do you know
> why the osd.0 is not up ?
> >
> > Cheers
> >
>
> I've been trying unsuccessfully to get this up and running since. I've
> added another OSD but still can't get to "active + clean" state. I'm not
> even sure if the problems I'm having are related to the OS version but I'm
> running out of ideas and unless somebody here can spot something obvious in
> the logs below I'm going to try rolling back to CentOS 6.
>
> $ echo "HEALTH" && ceph health && echo "STATUS" && ceph status && echo
> "OSD_DUMP" && ceph osd dump HEALTH HEALTH_WARN 129 pgs peering; 129 pgs
> stuck unclean STATUS
>     cluster f68332e4-1081-47b8-9b22-e5f3dc1f4521
>      health HEALTH_WARN 129 pgs peering; 129 pgs stuck unclean
>      monmap e1: 1 mons at {hp09=10.119.16.14:6789/0}, election epoch 2,
> quorum
>      0 hp09
>      osdmap e43: 3 osds: 3 up, 3 in
>       pgmap v61: 192 pgs, 3 pools, 0 bytes data, 0 objects
>             15469 MB used, 368 GB / 383 GB avail
>                  129 peering
>                   63 active+clean
> OSD_DUMP
> epoch 43
> fsid f68332e4-1081-47b8-9b22-e5f3dc1f4521
> created 2014-09-09 10:42:35.490711
> modified 2014-09-09 10:47:25.077178
> flags
> pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash
> rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
> crash_replay_interval 45 stripe_width 0 pool 1 'metadata' replicated size 3
> min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64
> last_change 1 flags hashpspool stripe_width 0 pool 2 'rbd' replicated size
> 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 64 pgp_num 64
> last_change 1 flags hashpspool stripe_width 0 max_osd 3
> osd.0 up   in  weight 1 up_from 4 up_thru 42 down_at 0 last_clean_interval
> [0,0) 10.119.16.14:6800/24988 10.119.16.14:6801/24988
> 10.119.16.14:6802/24988
> 10.119.16.14:6803/24988 exists,up 63f3f351-eccc-4a98-8f18-e107bd33f82b
> osd.1 up   in  weight 1 up_from 38 up_thru 42 down_at 36
> last_clean_interval
> [7,37) 10.119.16.15:6800/22999 10.119.16.15:6801/4022999
> 10.119.16.15:6802/4022999 10.119.16.15:6803/4022999 exists,up
> 8e1c029d-ebfb-4a8d-b567-ee9cd9ebd876
> osd.2 up   in  weight 1 up_from 42 up_thru 42 down_at 40
> last_clean_interval
> [11,41) 10.119.16.16:6800/25605 10.119.16.16:6805/5025605
> 10.119.16.16:6806/5025605 10.119.16.16:6807/5025605 exists,up
> 5d398bba-59f5-41f8-9bd6-aed6a0204656
>
> Sample of warnings from monitor log:
> 2014-09-09 10:51:10.636325 7f75037d0700  1 mon.hp09 at 0(leader).osd e72
> prepare_failure osd.1 10.119.16.15:6800/22999 from osd.2
> 10.119.16.16:6800/25605 is reporting failure:1
> 2014-09-09 10:51:10.636343 7f75037d0700  0 log [DBG] : osd.1
> 10.119.16.15:6800/22999 reported failed by osd.2 10.119.16.16:6800/25605
>
> Sample of warnings from osd.2 log:
> 2014-09-09 10:44:13.723714 7fb828c57700 -1 osd.2 18 heartbeat_check: no
> reply from osd.1 ever on either front or back, first ping sent 2014-09-09
> 10:43:30.437170 (cutoff 2014-09-09 10:43:53.723713)
> 2014-09-09 10:44:13.724883 7fb81f2f9700  0 log [WRN] : map e19 wrongly
> marked me down
> 2014-09-09 10:44:13.726104 7fb81f2f9700  0 osd.2 19 crush map has features
> 1107558400, adjusting msgr requires for mons
> 2014-09-09 10:44:13.726741 7fb811edb700  0 -- 10.119.16.16:0/25605 >>
> 10.119.16.15:6806/1022999 pipe(0x3171900 sd=34 :0 s=1 pgs=0 cs=0 l=1
> c=0x3ad8580).fault
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users at lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140909/87d361e2/attachment.htm>