HEALTH_ERR, size and min_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello List,
i have size = 3 and min_size = 2 with 3 Nodes.

My OSDs:

ceph osd tree
ID CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       60.17775 root default
-2       20.21155     host ceph01
 0   hdd  1.71089         osd.0       up  1.00000 1.00000
 8   hdd  1.71660         osd.8       up  1.00000 1.00000
 9   hdd  2.67029         osd.9       up  1.00000 1.00000
11   hdd  1.71649         osd.11      up  1.00000 1.00000
12   hdd  2.67020         osd.12      up  1.00000 1.00000
14   hdd  2.67020         osd.14      up  1.00000 1.00000
18   hdd  1.71649         osd.18      up  1.00000 1.00000
22   hdd  2.67020         osd.22      up  1.00000 1.00000
23   hdd  2.67020         osd.23      up  1.00000 1.00000
-3       19.08154     host ceph02
 2   hdd  2.67029         osd.2       up  1.00000 1.00000
 3   hdd  2.70000         osd.3       up  1.00000 1.00000
 7   hdd  2.67029         osd.7       up  1.00000 1.00000
13   hdd  2.67020         osd.13      up  1.00000 1.00000
16   hdd  1.59999         osd.16      up  1.00000 1.00000
19   hdd  2.38409         osd.19      up  1.00000 1.00000
24   hdd  2.67020         osd.24      up  1.00000 1.00000
25   hdd  1.71649         osd.25      up  1.00000 1.00000
-4       20.88466     host ceph03
 1   hdd  1.71660         osd.1       up  1.00000 1.00000
 4   hdd  2.67020         osd.4       up  1.00000 1.00000
 5   hdd  1.71660         osd.5       up  1.00000 1.00000
 6   hdd  1.71660         osd.6       up  1.00000 1.00000
15   hdd  2.67020         osd.15      up  1.00000 1.00000
17   hdd  1.62109         osd.17      up  1.00000 1.00000
20   hdd  1.71649         osd.20      up  1.00000 1.00000
21   hdd  2.67020         osd.21      up  1.00000 1.00000
27   hdd  1.71649         osd.27      up  1.00000 1.00000
32   hdd  2.67020         osd.32      up  1.00000 1.00000

I replaced two osds on node ceph01 and ran into "HEALTH_ERR".
My problem: it waits for the backfilling process?
Why did i run into HEALTH_ERR? I thought all data will be available on
at least one more node. or even two:

HEALTH_ERR 343351/10358292 objects misplaced (3.315%); Reduced data
availability: 19 pgs inactive; Degraded data redundancy:
639455/10358292 objects degraded (6.173%), 208 pgs degraded, 204 pgs
undersized; application not enabled on 1 pool(s); 29 slow requests are
blocked > 32 sec. Implicated osds ; 29 stuck requests are blocked >
4096 sec. Implicated osds 2,19,24
OBJECT_MISPLACED 343351/10358292 objects misplaced (3.315%)
PG_AVAILABILITY Reduced data availability: 19 pgs inactive
    pg 0.4 is stuck inactive for 4227.236803, current state
undersized+degraded+remapped+backfilling+peered, last acting [19]
    pg 0.12 is stuck inactive for 4227.267137, current state
undersized+degraded+remapped+backfilling+peered, last acting [13]
    pg 0.1b is stuck inactive for 4198.153642, current state
undersized+degraded+remapped+backfill_wait+peered, last acting [24]
    pg 0.1f is stuck inactive for 4226.574006, current state
undersized+degraded+remapped+backfilling+peered, last acting [19]
    pg 0.61 is stuck inactive for 4227.316336, current state
undersized+degraded+remapped+backfilling+peered, last acting [2]
    pg 0.85 is stuck inactive for 4227.287134, current state
undersized+degraded+remapped+backfill_wait+peered, last acting [13]
    pg 0.88 is stuck inactive for 4197.261935, current state
undersized+degraded+remapped+backfill_wait+peered, last acting [24]
    pg 0.bd is stuck inactive for 4226.607646, current state
undersized+degraded+remapped+backfilling+peered, last acting [2]
    pg 0.fc is stuck inactive for 4226.642664, current state
undersized+degraded+remapped+backfill_wait+peered, last acting [13]
    pg 0.140 is stuck inactive for 4198.277165, current state
undersized+degraded+remapped+backfilling+peered, last acting [2]
    pg 0.16c is stuck inactive for 4198.268985, current state
undersized+degraded+remapped+backfilling+peered, last acting [7]
    pg 0.21f is stuck inactive for 4198.228206, current state
undersized+degraded+remapped+backfilling+peered, last acting [2]
    pg 0.222 is stuck inactive for 4198.241280, current state
undersized+degraded+remapped+backfilling+peered, last acting [2]
    pg 0.27f is stuck inactive for 4198.201034, current state
undersized+degraded+remapped+backfill_wait+peered, last acting [19]
    pg 0.297 is stuck inactive for 4197.247869, current state
undersized+degraded+remapped+backfilling+peered, last acting [24]
    pg 0.298 is stuck inactive for 4226.572652, current state
undersized+degraded+remapped+backfilling+peered, last acting [19]
    pg 0.2cd is stuck inactive for 4226.643455, current state
undersized+degraded+remapped+backfilling+peered, last acting [16]
    pg 0.314 is stuck inactive for 4227.339749, current state
undersized+degraded+remapped+backfilling+peered, last acting [2]
    pg 0.375 is stuck inactive for 4227.260662, current state
undersized+degraded+remapped+backfilling+peered, last acting [19]
PG_DEGRADED Degraded data redundancy: 639455/10358292 objects degraded
(6.173%), 208 pgs degraded, 204 pgs undersized
    pg 0.17a is active+undersized+degraded+remapped+backfilling, acting [24,4]
    pg 0.17f is stuck undersized for 3811.397010, current state
active+undersized+degraded+remapped+backfill_wait, last acting [19,17]
    pg 0.182 is stuck undersized for 10640.416744, current state
active+undersized+degraded+remapped+backfill_wait, last acting [14,16]
    pg 0.184 is stuck undersized for 3938.548717, current state
active+undersized+degraded+remapped+backfill_wait, last acting [7,1]
    pg 0.195 is stuck undersized for 3939.556198, current state
active+undersized+degraded+remapped+backfill_wait, last acting [21,16]
    pg 0.196 is stuck undersized for 4196.543567, current state
active+undersized+degraded+remapped+backfilling, last acting [3,20]
    pg 0.337 is stuck undersized for 3938.457718, current state
active+undersized+degraded+remapped+backfill_wait, last acting [15,13]
    pg 0.33c is stuck undersized for 10715.420596, current state
active+undersized+degraded+remapped+backfilling, last acting [2,12]
    pg 0.340 is stuck undersized for 3811.450013, current state
active+undersized+degraded+remapped+backfilling, last acting [21,19]
    pg 0.345 is stuck undersized for 3939.510525, current state
active+undersized+degraded+remapped+backfill_wait, last acting [4,24]
    pg 0.346 is stuck undersized for 10639.199276, current state
active+undersized+degraded+remapped+backfill_wait, last acting [18,2]
    pg 0.34c is stuck undersized for 3811.523689, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,15]
    pg 0.351 is stuck undersized for 3811.347509, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,4]
    pg 0.356 is stuck undersized for 3811.671104, current state
active+undersized+degraded+remapped+backfill_wait, last acting [0,24]
    pg 0.35b is stuck undersized for 4191.430143, current state
active+undersized+degraded+remapped+backfilling, last acting [16,20]
    pg 0.35c is stuck undersized for 3939.514422, current state
active+undersized+degraded+remapped+backfill_wait, last acting [4,25]
    pg 0.35d is stuck undersized for 3938.543293, current state
active+undersized+degraded+remapped+backfill_wait, last acting [19,32]
    pg 0.365 is stuck undersized for 3938.524132, current state
active+undersized+degraded+remapped+backfill_wait, last acting [4,25]
    pg 0.36c is stuck undersized for 10715.466460, current state
active+undersized+degraded+remapped+backfilling, last acting [2,14]
    pg 0.36d is stuck undersized for 3939.540201, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,32]
    pg 0.370 is stuck undersized for 4191.552409, current state
active+undersized+degraded+remapped+backfilling, last acting [13,21]
    pg 0.371 is stuck undersized for 3938.440298, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,13]
    pg 0.375 is stuck undersized for 3938.545599, current state
undersized+degraded+remapped+backfilling+peered, last acting [19]
    pg 0.381 is stuck undersized for 3811.517412, current state
active+undersized+degraded+remapped+backfill_wait, last acting [4,3]
    pg 0.38a is stuck undersized for 10640.436011, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,11]
    pg 0.38b is stuck undersized for 4191.525469, current state
active+undersized+degraded+remapped+backfilling, last acting [24,32]
    pg 0.391 is stuck undersized for 3810.314900, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,19]
    pg 0.394 is stuck undersized for 3811.492367, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,14]
    pg 0.397 is stuck undersized for 4191.488161, current state
active+undersized+degraded+remapped+backfilling, last acting [7,32]
    pg 0.39a is stuck undersized for 3941.583783, current state
active+undersized+degraded+remapped+backfill_wait, last acting [11,19]
    pg 0.3a1 is stuck undersized for 3811.656295, current state
active+undersized+degraded+remapped+backfilling, last acting [2,4]
    pg 0.3a5 is stuck undersized for 3939.536321, current state
active+undersized+degraded+remapped+backfilling, last acting [24,20]
    pg 0.3ab is stuck undersized for 10640.435197, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,11]
    pg 0.3bb is stuck undersized for 10639.374080, current state
active+undersized+degraded+remapped+backfill_wait, last acting [14,24]
    pg 0.3c1 is stuck undersized for 3811.566173, current state
active+undersized+degraded+remapped+backfill_wait, last acting [19,17]
    pg 0.3c3 is stuck undersized for 10641.420944, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,23]
    pg 0.3c4 is stuck undersized for 3811.554642, current state
active+undersized+degraded+remapped+backfill_wait, last acting [19,21]
    pg 0.3c9 is stuck undersized for 4219.043674, current state
active+undersized+degraded+remapped+backfilling, last acting [13,4]
    pg 0.3cf is stuck undersized for 3941.146510, current state
active+undersized+degraded+remapped+backfill_wait, last acting [16,23]
    pg 0.3d0 is stuck undersized for 3938.433337, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,24]
    pg 0.3e6 is stuck undersized for 3939.459758, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,4]
    pg 0.3e9 is stuck undersized for 10640.420901, current state
active+undersized+degraded+remapped+backfill_wait, last acting [22,2]
    pg 0.3eb is stuck undersized for 3811.573977, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,13]
    pg 0.3ed is stuck undersized for 3939.549283, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,4]
    pg 0.3f1 is stuck undersized for 3938.542883, current state
active+undersized+degraded+remapped+backfill_wait, last acting [16,32]
    pg 0.3f2 is stuck undersized for 10639.375600, current state
active+undersized+degraded+remapped+backfill_wait, last acting [23,13]
    pg 0.3f3 is stuck undersized for 3811.496577, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,32]
    pg 0.3f5 is stuck undersized for 4191.587520, current state
active+undersized+degraded+remapped+backfilling, last acting [13,21]
    pg 0.3f7 is stuck undersized for 10639.374420, current state
active+undersized+degraded+remapped+backfill_wait, last acting [14,2]
    pg 0.3fa is stuck undersized for 10640.425955, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,11]
    pg 0.3fe is stuck undersized for 3939.552615, current state
active+undersized+degraded+remapped+backfill_wait, last acting [7,27]
POOL_APP_NOT_ENABLED application not enabled on 1 pool(s)
    application not enabled on pool 'rbdbench'
    use 'ceph osd pool application enable <pool-name> <app-name>',
where <app-name> is 'cephfs', 'rbd', 'rgw', or freeform for custom
applications.
REQUEST_SLOW 29 slow requests are blocked > 32 sec. Implicated osds
    29 ops are blocked > 2097.15 sec
REQUEST_STUCK 29 stuck requests are blocked > 4096 sec. Implicated osds 2,19,24
    29 ops are blocked > 4194.3 sec
    osds 2,19,24 have stuck requests > 4194.3 sec

Thanks,
Mario
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux