help! pg inactive and slow requests after filestore to bluestore migration, version 12.2.12

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello everybody,

[fix confusing typo]

I got a tree node ceph cluster made of E3-1220v3, 24GB ram, 6 hdd osd's with 32GB Intel Optane NVMe journal, 10GB networking.

I wanted to move to bluestore due to dropping support of filestore, our cluster was working fine with filestore and we could take complete nodes out for maintenance without issues.

root@ceph04:~# ceph osd pool get libvirt-pool size
size: 3
root@ceph04:~# ceph osd pool get libvirt-pool min_size
min_size: 2

I removed all osds from one node, zapping the osd and journal devices, we recreated the osds as bluestore and used a small 5GB partition as block device instead of journal for all osd's.

I saw the cluster suffer with pgs inactive and slow request.

I tried setting the following on all nodes, but no diffrence:
ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_recovery_sleep 0.3'
systemctl restart ceph-osd.target

How can I migrate to bluestore without inactive pgs or slow request. I got several more filestore clusters and I would like to know how to migrate without inactive pgs and slow reguests?

As a side question, I optimized our cluster for filestore, the Intel Optane NVMe journals showed good fio dsync write tests, does bluestore also use dsync writes for rockdb caching or can we select NVMe devices on other specifications? My test with filestores showed that Optane NVMe SSD was faster then the Samsung NVMe SSD 970 Pro and I only need a a few GB for filestore journals, but with bluestore rockdb caching the situation is different and I can't find documentation on how to speed test NVMe devices for bluestore.

Kind regards,

Jelle

root@ceph04:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       60.04524 root default
-2       20.01263     host ceph04
 0   hdd  2.72899         osd.0       up  1.00000 1.00000
 1   hdd  2.72899         osd.1       up  1.00000 1.00000
 2   hdd  5.45799         osd.2       up  1.00000 1.00000
 3   hdd  2.72899         osd.3       up  1.00000 1.00000
14   hdd  3.63869         osd.14      up  1.00000 1.00000
15   hdd  2.72899         osd.15      up  1.00000 1.00000
-3       20.01263     host ceph05
 4   hdd  5.45799         osd.4       up  1.00000 1.00000
 5   hdd  2.72899         osd.5       up  1.00000 1.00000
 6   hdd  2.72899         osd.6       up  1.00000 1.00000
13   hdd  3.63869         osd.13      up  1.00000 1.00000
16   hdd  2.72899         osd.16      up  1.00000 1.00000
18   hdd  2.72899         osd.18      up  1.00000 1.00000
-4       20.01997     host ceph06
 8   hdd  5.45999         osd.8       up  1.00000 1.00000
 9   hdd  2.73000         osd.9       up  1.00000 1.00000
10   hdd  2.73000         osd.10      up  1.00000 1.00000
11   hdd  2.73000         osd.11      up  1.00000 1.00000
12   hdd  3.64000         osd.12      up  1.00000 1.00000
17   hdd  2.73000         osd.17      up  1.00000 1.00000


root@ceph04:~# ceph status
  cluster:
    id:     85873cda-4865-4147-819d-8deda5345db5
    health: HEALTH_WARN
            18962/11801097 objects misplaced (0.161%)
            1/3933699 objects unfound (0.000%)
            Reduced data availability: 42 pgs inactive
Degraded data redundancy: 3645135/11801097 objects degraded (30.888%), 959 pgs degraded, 960 pgs undersized
            110 slow requests are blocked > 32 sec. Implicated osds 3,10,11

  services:
    mon: 3 daemons, quorum ceph04,ceph05,ceph06
    mgr: ceph04(active), standbys: ceph06, ceph05
    osd: 18 osds: 18 up, 18 in; 964 remapped pgs

  data:
    pools:   1 pools, 1024 pgs
    objects: 3.93M objects, 15.0TiB
    usage:   31.2TiB used, 28.8TiB / 60.0TiB avail
    pgs:     4.102% pgs not active
             3645135/11801097 objects degraded (30.888%)
             18962/11801097 objects misplaced (0.161%)
             1/3933699 objects unfound (0.000%)
             913 active+undersized+degraded+remapped+backfill_wait
             60  active+clean
             41  activating+undersized+degraded+remapped
             4   active+remapped+backfill_wait
             4   active+undersized+degraded+remapped+backfilling
             1   undersized+degraded+remapped+backfilling+peered
             1   active+recovery_wait+undersized+remapped

  io:
    recovery: 197MiB/s, 49objects/s


root@ceph04:~# ceph health detail
HEALTH_WARN 18962/11801097 objects misplaced (0.161%); 1/3933699 objects unfound (0.000%); Reduced data availability: 42 pgs inactive; Degraded data redundancy: 3643636/11801097 objects degraded (30.875%), 959 pgs degraded, 960 pgs undersized; 110 slow requests are blocked > 32 sec. Implicated osds 3,10,11
OBJECT_MISPLACED 18962/11801097 objects misplaced (0.161%)
OBJECT_UNFOUND 1/3933699 objects unfound (0.000%)
    pg 3.361 has 1 unfound objects
PG_AVAILABILITY Reduced data availability: 42 pgs inactive
pg 3.26 is stuck inactive for 19268.231084, current state activating+undersized+degraded+remapped, last acting [9,2] pg 3.33 is stuck inactive for 20788.205717, current state activating+undersized+degraded+remapped, last acting [3,10] pg 3.44 is stuck inactive for 24626.274351, current state activating+undersized+degraded+remapped, last acting [10,2] pg 3.83 is stuck inactive for 21008.265302, current state activating+undersized+degraded+remapped, last acting [17,0] pg 3.89 is stuck inactive for 24626.266516, current state activating+undersized+degraded+remapped, last acting [15,17] pg 3.a2 is stuck inactive for 24627.362587, current state activating+undersized+degraded+remapped, last acting [8,14] pg 3.a6 is stuck inactive for 24626.330592, current state activating+undersized+degraded+remapped, last acting [3,8] pg 3.b0 is stuck inactive for 20403.384828, current state activating+undersized+degraded+remapped, last acting [3,12] pg 3.e6 is stuck inactive for 20788.175811, current state activating+undersized+degraded+remapped, last acting [3,10] pg 3.e8 is stuck inactive for 1011.080905, current state undersized+degraded+remapped+backfilling+peered, last acting [10] pg 3.12e is stuck inactive for 24626.236657, current state activating+undersized+degraded+remapped, last acting [11,2] pg 3.135 is stuck inactive for 21008.245618, current state activating+undersized+degraded+remapped, last acting [8,14] pg 3.14a is stuck inactive for 24626.319956, current state activating+undersized+degraded+remapped, last acting [12,0] pg 3.159 is stuck inactive for 20403.344759, current state activating+undersized+degraded+remapped, last acting [14,12] pg 3.15b is stuck inactive for 21008.251625, current state activating+undersized+degraded+remapped, last acting [17,14] pg 3.17a is stuck inactive for 20403.369711, current state activating+undersized+degraded+remapped, last acting [15,12] pg 3.1ac is stuck inactive for 21008.255550, current state activating+undersized+degraded+remapped, last acting [3,10] pg 3.1ae is stuck inactive for 24626.268989, current state activating+undersized+degraded+remapped, last acting [0,8] pg 3.1b6 is stuck inactive for 24626.187356, current state activating+undersized+degraded+remapped, last acting [2,11] pg 3.1c2 is stuck inactive for 24626.342254, current state activating+undersized+degraded+remapped, last acting [12,2] pg 3.1cb is stuck inactive for 21008.294034, current state activating+undersized+degraded+remapped, last acting [10,2] pg 3.1d9 is stuck inactive for 24626.232616, current state activating+undersized+degraded+remapped, last acting [11,2] pg 3.1fd is stuck inactive for 24626.196421, current state activating+undersized+degraded+remapped, last acting [2,8] pg 3.240 is stuck inactive for 20788.155859, current state activating+undersized+degraded+remapped, last acting [8,14] pg 3.253 is stuck inactive for 20403.371954, current state activating+undersized+degraded+remapped, last acting [10,2] pg 3.275 is stuck inactive for 24626.345347, current state activating+undersized+degraded+remapped, last acting [17,2] pg 3.297 is stuck inactive for 20788.175507, current state activating+undersized+degraded+remapped, last acting [2,8] pg 3.2b9 is stuck inactive for 19268.208986, current state activating+undersized+degraded+remapped, last acting [8,2] pg 3.2d6 is stuck inactive for 24626.284743, current state activating+undersized+degraded+remapped, last acting [2,12] pg 3.2fc is stuck inactive for 24627.370829, current state activating+undersized+degraded+remapped, last acting [14,9] pg 3.30d is stuck inactive for 24626.321020, current state activating+undersized+degraded+remapped, last acting [8,14] pg 3.335 is stuck inactive for 24626.185613, current state activating+undersized+degraded+remapped, last acting [14,17] pg 3.336 is stuck inactive for 24626.290136, current state activating+undersized+degraded+remapped, last acting [10,3] pg 3.33e is stuck inactive for 21008.221375, current state activating+undersized+degraded+remapped, last acting [8,2] pg 3.357 is stuck inactive for 24627.375754, current state activating+undersized+degraded+remapped, last acting [1,11] pg 3.369 is stuck inactive for 24626.198568, current state activating+undersized+degraded+remapped, last acting [2,8] pg 3.374 is stuck inactive for 24626.196342, current state activating+undersized+degraded+remapped, last acting [8,2] pg 3.388 is stuck inactive for 19268.180538, current state activating+undersized+degraded+remapped, last acting [2,12] pg 3.38a is stuck inactive for 24626.281415, current state activating+undersized+degraded+remapped, last acting [1,17] pg 3.3ac is stuck inactive for 24626.195233, current state activating+undersized+degraded+remapped, last acting [2,8] pg 3.3e5 is stuck inactive for 24626.262231, current state activating+undersized+degraded+remapped, last acting [0,9] pg 3.3e9 is stuck inactive for 24626.302015, current state activating+undersized+degraded+remapped, last acting [10,0] PG_DEGRADED Degraded data redundancy: 3643636/11801097 objects degraded (30.875%), 959 pgs degraded, 960 pgs undersized pg 3.3ca is active+undersized+degraded+remapped+backfill_wait, acting [2,8] pg 3.3cb is stuck undersized for 959.501364, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,12] pg 3.3cc is stuck undersized for 980.961676, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,11] pg 3.3cd is stuck undersized for 979.929226, current state active+undersized+degraded+remapped+backfill_wait, last acting [17,3] pg 3.3ce is stuck undersized for 979.971186, current state active+undersized+degraded+remapped+backfill_wait, last acting [17,15] pg 3.3cf is stuck undersized for 970.634412, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,14] pg 3.3d0 is stuck undersized for 960.511363, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,8] pg 3.3d1 is stuck undersized for 970.633580, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,14] pg 3.3d2 is stuck undersized for 959.524726, current state active+undersized+degraded+remapped+backfill_wait, last acting [10,2] pg 3.3d3 is stuck undersized for 979.992415, current state active+undersized+degraded+remapped+backfill_wait, last acting [1,17] pg 3.3d4 is stuck undersized for 959.518829, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,17] pg 3.3d5 is stuck undersized for 980.997842, current state active+undersized+degraded+remapped+backfill_wait, last acting [11,1] pg 3.3d6 is stuck undersized for 980.877456, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,8] pg 3.3d7 is stuck undersized for 980.988891, current state active+undersized+degraded+remapped+backfill_wait, last acting [11,15] pg 3.3d8 is stuck undersized for 960.491701, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,8] pg 3.3d9 is stuck undersized for 960.549814, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,8] pg 3.3da is stuck undersized for 959.502189, current state active+undersized+degraded+remapped+backfill_wait, last acting [12,2] pg 3.3db is stuck undersized for 980.971951, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,11] pg 3.3dc is stuck undersized for 971.634994, current state active+undersized+degraded+remapped+backfill_wait, last acting [8,14] pg 3.3dd is stuck undersized for 981.001465, current state active+undersized+degraded+remapped+backfill_wait, last acting [1,8] pg 3.3de is stuck undersized for 959.223153, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,2] pg 3.3df is stuck undersized for 980.994984, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,8] pg 3.3e0 is stuck undersized for 970.654694, current state active+undersized+degraded+remapped+backfill_wait, last acting [10,14] pg 3.3e1 is stuck undersized for 980.977833, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,11] pg 3.3e2 is stuck undersized for 960.508628, current state active+undersized+degraded+remapped+backfill_wait, last acting [8,2] pg 3.3e4 is stuck undersized for 980.880956, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,8] pg 3.3e5 is stuck undersized for 979.984434, current state activating+undersized+degraded+remapped, last acting [0,9] pg 3.3e6 is stuck undersized for 981.003536, current state active+undersized+degraded+remapped+backfill_wait, last acting [11,3] pg 3.3e7 is stuck undersized for 970.628714, current state active+undersized+degraded+remapped+backfill_wait, last acting [12,14] pg 3.3e8 is stuck undersized for 959.515353, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,12] pg 3.3e9 is stuck undersized for 980.011154, current state activating+undersized+degraded+remapped, last acting [10,0] pg 3.3ea is stuck undersized for 979.937856, current state active+undersized+degraded+remapped+backfill_wait, last acting [0,12] pg 3.3eb is stuck undersized for 959.500630, current state active+undersized+degraded+remapped+backfill_wait, last acting [12,2] pg 3.3ec is stuck undersized for 980.013765, current state active+undersized+degraded+remapped+backfill_wait, last acting [3,10] pg 3.3ed is stuck undersized for 970.644391, current state active+undersized+degraded+remapped+backfill_wait, last acting [14,17] pg 3.3ee is stuck undersized for 979.970712, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,0] pg 3.3f0 is stuck undersized for 959.514173, current state active+undersized+degraded+remapped+backfill_wait, last acting [10,2] pg 3.3f1 is stuck undersized for 970.611444, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,14] pg 3.3f2 is stuck undersized for 979.995013, current state active+undersized+degraded+remapped+backfill_wait, last acting [1,10] pg 3.3f3 is stuck undersized for 959.219621, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,9] pg 3.3f4 is stuck undersized for 970.631428, current state active+undersized+degraded+remapped+backfill_wait, last acting [12,14] pg 3.3f5 is stuck undersized for 959.504461, current state active+undersized+degraded+remapped+backfill_wait, last acting [10,2] pg 3.3f7 is stuck undersized for 970.645735, current state active+undersized+degraded+remapped+backfill_wait, last acting [14,12] pg 3.3f8 is stuck undersized for 960.489631, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,8] pg 3.3f9 is stuck undersized for 979.957529, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,15] pg 3.3fa is stuck undersized for 979.967269, current state active+undersized+degraded+remapped+backfill_wait, last acting [12,3] pg 3.3fb is stuck undersized for 981.001507, current state active+undersized+degraded+remapped+backfill_wait, last acting [15,11] pg 3.3fc is stuck undersized for 960.514524, current state active+undersized+degraded+remapped+backfill_wait, last acting [2,11] pg 3.3fd is stuck undersized for 960.542219, current state active+undersized+degraded+remapped+backfill_wait, last acting [11,2] pg 3.3fe is stuck undersized for 959.488418, current state active+undersized+degraded+remapped+backfill_wait, last acting [10,2] pg 3.3ff is stuck undersized for 970.660468, current state active+undersized+degraded+remapped+backfill_wait, last acting [9,14]
REQUEST_SLOW 110 slow requests are blocked > 32 sec. Implicated osds 3,10,11
    110 ops are blocked > 1048.58 sec
    osds 3,10,11 have blocked requests > 1048.58 sec
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux