Hello everybody,
I got a tree node ceph cluster made of E3-1220v3, 24GB ram, 6 hdd osd's
with 32GB Intel Optane NVMe journal, 10GB networking.
I wanted to move to bluestore due to dropping support of file store, our
cluster was working fine with bluestore and we could take complete nodes
out for maintenance without issues.
root@ceph04:~# ceph osd pool get libvirt-pool size
size: 3
root@ceph04:~# ceph osd pool get libvirt-pool min_size
min_size: 2
I removed all osds from one node, zapping the osd and journal devices,
we recreated the osds as bluestore and used a small 5GB partition as
block device instead of journal for all osd's.
I saw the cluster suffer with pgs inactive and slow request.
I tried setting the following on all nodes, but no diffrence:
ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_recovery_sleep 0.3'
systemctl restart ceph-osd.target
How can I migrate to bluestore without inactive pgs or slow request. I
got several more filestore clusters and I would like to know how to
migrate without inactive pgs and slow reguests?
As a side question, I optimized our cluster for filestore, the Intel
Optane NVMe journals showed good fio dsync write tests, does bluestore
also use dsync writes for block caching or can we select NVMe devices on
other specifications? My test with filestores showed that Optane NVMe
SSD was faster then the Samsung NVMe SSD 970 Pro and I only need a a few
GB for filestore journals, but with bluestore block caching the
situation is different and I can't find documentation on how to speed
test NVMe devices for bluestore.
Kind regards,
Jelle
root@ceph04:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 60.04524 root default
-2 20.01263 host ceph04
0 hdd 2.72899 osd.0 up 1.00000 1.00000
1 hdd 2.72899 osd.1 up 1.00000 1.00000
2 hdd 5.45799 osd.2 up 1.00000 1.00000
3 hdd 2.72899 osd.3 up 1.00000 1.00000
14 hdd 3.63869 osd.14 up 1.00000 1.00000
15 hdd 2.72899 osd.15 up 1.00000 1.00000
-3 20.01263 host ceph05
4 hdd 5.45799 osd.4 up 1.00000 1.00000
5 hdd 2.72899 osd.5 up 1.00000 1.00000
6 hdd 2.72899 osd.6 up 1.00000 1.00000
13 hdd 3.63869 osd.13 up 1.00000 1.00000
16 hdd 2.72899 osd.16 up 1.00000 1.00000
18 hdd 2.72899 osd.18 up 1.00000 1.00000
-4 20.01997 host ceph06
8 hdd 5.45999 osd.8 up 1.00000 1.00000
9 hdd 2.73000 osd.9 up 1.00000 1.00000
10 hdd 2.73000 osd.10 up 1.00000 1.00000
11 hdd 2.73000 osd.11 up 1.00000 1.00000
12 hdd 3.64000 osd.12 up 1.00000 1.00000
17 hdd 2.73000 osd.17 up 1.00000 1.00000
root@ceph04:~# ceph status
cluster:
id: 85873cda-4865-4147-819d-8deda5345db5
health: HEALTH_WARN
18962/11801097 objects misplaced (0.161%)
1/3933699 objects unfound (0.000%)
Reduced data availability: 42 pgs inactive
Degraded data redundancy: 3645135/11801097 objects degraded
(30.888%), 959 pgs degraded, 960 pgs undersized
110 slow requests are blocked > 32 sec. Implicated osds 3,10,11
services:
mon: 3 daemons, quorum ceph04,ceph05,ceph06
mgr: ceph04(active), standbys: ceph06, ceph05
osd: 18 osds: 18 up, 18 in; 964 remapped pgs
data:
pools: 1 pools, 1024 pgs
objects: 3.93M objects, 15.0TiB
usage: 31.2TiB used, 28.8TiB / 60.0TiB avail
pgs: 4.102% pgs not active
3645135/11801097 objects degraded (30.888%)
18962/11801097 objects misplaced (0.161%)
1/3933699 objects unfound (0.000%)
913 active+undersized+degraded+remapped+backfill_wait
60 active+clean
41 activating+undersized+degraded+remapped
4 active+remapped+backfill_wait
4 active+undersized+degraded+remapped+backfilling
1 undersized+degraded+remapped+backfilling+peered
1 active+recovery_wait+undersized+remapped
io:
recovery: 197MiB/s, 49objects/s
root@ceph04:~# ceph health detail
HEALTH_WARN 18962/11801097 objects misplaced (0.161%); 1/3933699 objects
unfound (0.000%); Reduced data availability: 42 pgs inactive; Degraded
data redundancy: 3643636/11801097 objects degraded (30.875%), 959 pgs
degraded, 960 pgs undersized; 110 slow requests are blocked > 32 sec.
Implicated osds 3,10,11
OBJECT_MISPLACED 18962/11801097 objects misplaced (0.161%)
OBJECT_UNFOUND 1/3933699 objects unfound (0.000%)
pg 3.361 has 1 unfound objects
PG_AVAILABILITY Reduced data availability: 42 pgs inactive
pg 3.26 is stuck inactive for 19268.231084, current state
activating+undersized+degraded+remapped, last acting [9,2]
pg 3.33 is stuck inactive for 20788.205717, current state
activating+undersized+degraded+remapped, last acting [3,10]
pg 3.44 is stuck inactive for 24626.274351, current state
activating+undersized+degraded+remapped, last acting [10,2]
pg 3.83 is stuck inactive for 21008.265302, current state
activating+undersized+degraded+remapped, last acting [17,0]
pg 3.89 is stuck inactive for 24626.266516, current state
activating+undersized+degraded+remapped, last acting [15,17]
pg 3.a2 is stuck inactive for 24627.362587, current state
activating+undersized+degraded+remapped, last acting [8,14]
pg 3.a6 is stuck inactive for 24626.330592, current state
activating+undersized+degraded+remapped, last acting [3,8]
pg 3.b0 is stuck inactive for 20403.384828, current state
activating+undersized+degraded+remapped, last acting [3,12]
pg 3.e6 is stuck inactive for 20788.175811, current state
activating+undersized+degraded+remapped, last acting [3,10]
pg 3.e8 is stuck inactive for 1011.080905, current state
undersized+degraded+remapped+backfilling+peered, last acting [10]
pg 3.12e is stuck inactive for 24626.236657, current state
activating+undersized+degraded+remapped, last acting [11,2]
pg 3.135 is stuck inactive for 21008.245618, current state
activating+undersized+degraded+remapped, last acting [8,14]
pg 3.14a is stuck inactive for 24626.319956, current state
activating+undersized+degraded+remapped, last acting [12,0]
pg 3.159 is stuck inactive for 20403.344759, current state
activating+undersized+degraded+remapped, last acting [14,12]
pg 3.15b is stuck inactive for 21008.251625, current state
activating+undersized+degraded+remapped, last acting [17,14]
pg 3.17a is stuck inactive for 20403.369711, current state
activating+undersized+degraded+remapped, last acting [15,12]
pg 3.1ac is stuck inactive for 21008.255550, current state
activating+undersized+degraded+remapped, last acting [3,10]
pg 3.1ae is stuck inactive for 24626.268989, current state
activating+undersized+degraded+remapped, last acting [0,8]
pg 3.1b6 is stuck inactive for 24626.187356, current state
activating+undersized+degraded+remapped, last acting [2,11]
pg 3.1c2 is stuck inactive for 24626.342254, current state
activating+undersized+degraded+remapped, last acting [12,2]
pg 3.1cb is stuck inactive for 21008.294034, current state
activating+undersized+degraded+remapped, last acting [10,2]
pg 3.1d9 is stuck inactive for 24626.232616, current state
activating+undersized+degraded+remapped, last acting [11,2]
pg 3.1fd is stuck inactive for 24626.196421, current state
activating+undersized+degraded+remapped, last acting [2,8]
pg 3.240 is stuck inactive for 20788.155859, current state
activating+undersized+degraded+remapped, last acting [8,14]
pg 3.253 is stuck inactive for 20403.371954, current state
activating+undersized+degraded+remapped, last acting [10,2]
pg 3.275 is stuck inactive for 24626.345347, current state
activating+undersized+degraded+remapped, last acting [17,2]
pg 3.297 is stuck inactive for 20788.175507, current state
activating+undersized+degraded+remapped, last acting [2,8]
pg 3.2b9 is stuck inactive for 19268.208986, current state
activating+undersized+degraded+remapped, last acting [8,2]
pg 3.2d6 is stuck inactive for 24626.284743, current state
activating+undersized+degraded+remapped, last acting [2,12]
pg 3.2fc is stuck inactive for 24627.370829, current state
activating+undersized+degraded+remapped, last acting [14,9]
pg 3.30d is stuck inactive for 24626.321020, current state
activating+undersized+degraded+remapped, last acting [8,14]
pg 3.335 is stuck inactive for 24626.185613, current state
activating+undersized+degraded+remapped, last acting [14,17]
pg 3.336 is stuck inactive for 24626.290136, current state
activating+undersized+degraded+remapped, last acting [10,3]
pg 3.33e is stuck inactive for 21008.221375, current state
activating+undersized+degraded+remapped, last acting [8,2]
pg 3.357 is stuck inactive for 24627.375754, current state
activating+undersized+degraded+remapped, last acting [1,11]
pg 3.369 is stuck inactive for 24626.198568, current state
activating+undersized+degraded+remapped, last acting [2,8]
pg 3.374 is stuck inactive for 24626.196342, current state
activating+undersized+degraded+remapped, last acting [8,2]
pg 3.388 is stuck inactive for 19268.180538, current state
activating+undersized+degraded+remapped, last acting [2,12]
pg 3.38a is stuck inactive for 24626.281415, current state
activating+undersized+degraded+remapped, last acting [1,17]
pg 3.3ac is stuck inactive for 24626.195233, current state
activating+undersized+degraded+remapped, last acting [2,8]
pg 3.3e5 is stuck inactive for 24626.262231, current state
activating+undersized+degraded+remapped, last acting [0,9]
pg 3.3e9 is stuck inactive for 24626.302015, current state
activating+undersized+degraded+remapped, last acting [10,0]
PG_DEGRADED Degraded data redundancy: 3643636/11801097 objects degraded
(30.875%), 959 pgs degraded, 960 pgs undersized
pg 3.3ca is active+undersized+degraded+remapped+backfill_wait,
acting [2,8]
pg 3.3cb is stuck undersized for 959.501364, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,12]
pg 3.3cc is stuck undersized for 980.961676, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,11]
pg 3.3cd is stuck undersized for 979.929226, current state
active+undersized+degraded+remapped+backfill_wait, last acting [17,3]
pg 3.3ce is stuck undersized for 979.971186, current state
active+undersized+degraded+remapped+backfill_wait, last acting [17,15]
pg 3.3cf is stuck undersized for 970.634412, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,14]
pg 3.3d0 is stuck undersized for 960.511363, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,8]
pg 3.3d1 is stuck undersized for 970.633580, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,14]
pg 3.3d2 is stuck undersized for 959.524726, current state
active+undersized+degraded+remapped+backfill_wait, last acting [10,2]
pg 3.3d3 is stuck undersized for 979.992415, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,17]
pg 3.3d4 is stuck undersized for 959.518829, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,17]
pg 3.3d5 is stuck undersized for 980.997842, current state
active+undersized+degraded+remapped+backfill_wait, last acting [11,1]
pg 3.3d6 is stuck undersized for 980.877456, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,8]
pg 3.3d7 is stuck undersized for 980.988891, current state
active+undersized+degraded+remapped+backfill_wait, last acting [11,15]
pg 3.3d8 is stuck undersized for 960.491701, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,8]
pg 3.3d9 is stuck undersized for 960.549814, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,8]
pg 3.3da is stuck undersized for 959.502189, current state
active+undersized+degraded+remapped+backfill_wait, last acting [12,2]
pg 3.3db is stuck undersized for 980.971951, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,11]
pg 3.3dc is stuck undersized for 971.634994, current state
active+undersized+degraded+remapped+backfill_wait, last acting [8,14]
pg 3.3dd is stuck undersized for 981.001465, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,8]
pg 3.3de is stuck undersized for 959.223153, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,2]
pg 3.3df is stuck undersized for 980.994984, current state
active+undersized+degraded+remapped+backfill_wait, last acting [0,8]
pg 3.3e0 is stuck undersized for 970.654694, current state
active+undersized+degraded+remapped+backfill_wait, last acting [10,14]
pg 3.3e1 is stuck undersized for 980.977833, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,11]
pg 3.3e2 is stuck undersized for 960.508628, current state
active+undersized+degraded+remapped+backfill_wait, last acting [8,2]
pg 3.3e4 is stuck undersized for 980.880956, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,8]
pg 3.3e5 is stuck undersized for 979.984434, current state
activating+undersized+degraded+remapped, last acting [0,9]
pg 3.3e6 is stuck undersized for 981.003536, current state
active+undersized+degraded+remapped+backfill_wait, last acting [11,3]
pg 3.3e7 is stuck undersized for 970.628714, current state
active+undersized+degraded+remapped+backfill_wait, last acting [12,14]
pg 3.3e8 is stuck undersized for 959.515353, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,12]
pg 3.3e9 is stuck undersized for 980.011154, current state
activating+undersized+degraded+remapped, last acting [10,0]
pg 3.3ea is stuck undersized for 979.937856, current state
active+undersized+degraded+remapped+backfill_wait, last acting [0,12]
pg 3.3eb is stuck undersized for 959.500630, current state
active+undersized+degraded+remapped+backfill_wait, last acting [12,2]
pg 3.3ec is stuck undersized for 980.013765, current state
active+undersized+degraded+remapped+backfill_wait, last acting [3,10]
pg 3.3ed is stuck undersized for 970.644391, current state
active+undersized+degraded+remapped+backfill_wait, last acting [14,17]
pg 3.3ee is stuck undersized for 979.970712, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,0]
pg 3.3f0 is stuck undersized for 959.514173, current state
active+undersized+degraded+remapped+backfill_wait, last acting [10,2]
pg 3.3f1 is stuck undersized for 970.611444, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,14]
pg 3.3f2 is stuck undersized for 979.995013, current state
active+undersized+degraded+remapped+backfill_wait, last acting [1,10]
pg 3.3f3 is stuck undersized for 959.219621, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,9]
pg 3.3f4 is stuck undersized for 970.631428, current state
active+undersized+degraded+remapped+backfill_wait, last acting [12,14]
pg 3.3f5 is stuck undersized for 959.504461, current state
active+undersized+degraded+remapped+backfill_wait, last acting [10,2]
pg 3.3f7 is stuck undersized for 970.645735, current state
active+undersized+degraded+remapped+backfill_wait, last acting [14,12]
pg 3.3f8 is stuck undersized for 960.489631, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,8]
pg 3.3f9 is stuck undersized for 979.957529, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,15]
pg 3.3fa is stuck undersized for 979.967269, current state
active+undersized+degraded+remapped+backfill_wait, last acting [12,3]
pg 3.3fb is stuck undersized for 981.001507, current state
active+undersized+degraded+remapped+backfill_wait, last acting [15,11]
pg 3.3fc is stuck undersized for 960.514524, current state
active+undersized+degraded+remapped+backfill_wait, last acting [2,11]
pg 3.3fd is stuck undersized for 960.542219, current state
active+undersized+degraded+remapped+backfill_wait, last acting [11,2]
pg 3.3fe is stuck undersized for 959.488418, current state
active+undersized+degraded+remapped+backfill_wait, last acting [10,2]
pg 3.3ff is stuck undersized for 970.660468, current state
active+undersized+degraded+remapped+backfill_wait, last acting [9,14]
REQUEST_SLOW 110 slow requests are blocked > 32 sec. Implicated osds 3,10,11
110 ops are blocked > 1048.58 sec
osds 3,10,11 have blocked requests > 1048.58 sec
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com