Re: Still CRUSH problems with 0.94.1 ? (explained)

"fred@xxxxxxxxxx" <fred@xxxxxxxxxx> · Wed, 22 Apr 2015 12:18:57 +0200

Hi all,

responding to my yesterday email, I have interesting informations 
confirming that the problem is not at all related to Hammer.
Seing really nothing explaining the weird comportment, I've reinstalled 
a Giant and had the same symptoms letting me think it had to be hardware 
related...

And it was !

Our nodes are dual-attached, 2 x 10Gbps link emX interfaces, LACP mode 
4. The IP is on the bond0 interface. I've discovered that one of the em 
interface of a host didn't talk, and was in fact down.

Even if the bond0 is UP and the host speaks normaly, a physical 
interface down has for consequence stuck/unclean/peering PGs !

It leads me thinking that on startup the ceph-osd processes bind 
themself to one or another of the 2 emX physical interfaces, whatever 
their state, and not to the bond0 interface as they should.

You put the faulty interface back UP, restart ceph on the node and all 
the stuck/unclean/peering PGs disappear.

Now say you lose a physical interface during normal activity (without 
ceph restart on the node), and make some activity (like a pool create)...
->  peering/stuck/unclean PGs reappear demonstrating the process 
attachement to the physical interface.

Now the question, as it compromises redundancy, is this comportment by 
design ?

Frederic

fred@xxxxxxxxxx <fred@xxxxxxxxxx> a écrit le 21/04/15 15:03 :
Hi all,

may there be a problem with the crush function during 'from scratch' 
installation of 0.94.1-0 ?

This has been tested many times, with ceph-deploy-1.5.22-0 or 
ceph-deploy-1.5.23-0. Platform RHEL7.

Each time, the new cluster ends up in a weird state never seen on my 
previous installed versions (0.94, 0.87.1),
- I've seen things perhaps linked to ceph-deploy-1.5.23-0, either one 
or more monitors being unable to form the cluster (with respawning 
'python /usr/sbin/ceph-create-keys' messages). But I think that's 
other part of the issue.
- the main issue is visible as a warning on health of the PGs as soon 
as the cluster is enough formed to answer a 'ceph -s'.

- here is a 1 Mon, almost empty freshly installed cluster :

ROOT > ceph -s
   cluster e581ab43-d0f5-4ea8-811f-94c8df16d044
    health HEALTH_WARN
           2 pgs degraded
           14 pgs peering
           4 pgs stale
           2 pgs stuck degraded
           25 pgs stuck inactive
           4 pgs stuck stale
           27 pgs stuck unclean
           2 pgs stuck undersized
           2 pgs undersized
           too few PGs per OSD (3 < min 30)
    monmap e1: 1 mons at {helga=10.10.10.64:6789/0}
           election epoch 2, quorum 0 helga
    osdmap e398: 60 osds: 60 up, 60 in; 2 remapped pgs
     pgmap v1553: 64 pgs, 1 pools, 0 bytes data, 0 objects
           2829 MB used, 218 TB / 218 TB avail
                 37 active+clean
                 12 peering
                 11 activating
                  2 stale+active+undersized+degraded
                  2 stale+remapped+peering

with time, the number of defects is growing. They literraly explode if 
we put objects on it.

- a 'ceph health detail' show for example entries like this one :
pg 0.22 is stuck inactive since forever, current state peering, last 
acting [18,17,0]

- A query on the PG shows
ceph pg  0.22 query
{
   "state": "peering",
../..
    "up": [
       18,
       17,
       0
   ],
          "blocked_by": [
               0,
               1,
               5,
               17
           ],
../..
}

If my understanding of the ceph query is correct, OSDs 1, 5 and 17 
have nothing do do with this PG.... Where do they come from ??
Couldn't this be part of the "critical issues with CRUSH" 0.94.1 is 
meant to correct ?

Frederic

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com