Hello,
while digging into this further i saw that it takes ages until all pgs
are active. After starting the OSD 3% of all pgs are inactive and it
takes minutes after they're active.
The log of the OSD is full of:
2019-01-16 15:19:13.568527 7fecbf7da700 0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=184 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=185,(3+0)=2}}] _update_calc_stats ml 185 upset size 3 up 2
2019-01-16 15:19:13.568637 7fecbf7da700 0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=184 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=185,(3+0)=2}}] _update_calc_stats ml 2 upset size 3 up 3
2019-01-16 15:19:15.909327 7fecbf7da700 0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=184,(3+0)=3}}] _update_calc_stats ml 184 upset size 3 up 2
2019-01-16 15:19:15.909446 7fecbf7da700 0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=184,(3+0)=3}}] _update_calc_stats ml 3 upset size 3 up 3
2019-01-16 15:19:23.503231 7fecb97ff700 0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2
Greets,
Stefan
Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG:
Hi,
no ok it was not. Bug still present. It was only working because the
osdmap was so far away that it has started backfill instead of recovery.
So it happens only in the recovery case.
Greets,
Stefan
Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
Am 15.01.19 um 12:45 schrieb Marc Roos:
I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)
it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.
Greets,
Stefan
-----Original Message-----
From: Stefan Priebe - Profihost AG [mailto:s.priebe@xxxxxxxxxxxx]
Sent: 15 January 2019 10:26
To: ceph-users@xxxxxxxxxxxxxx
Cc: n.fahldieck@xxxxxxxxxxxx
Subject: Re: slow requests and high i/o / read rate on
bluestore osds after upgrade 12.2.8 -> 12.2.10
Hello list,
i also tested current upstream/luminous branch and it happens as well. A
clean install works fine. It only happens on upgraded bluestore osds.
Greets,
Stefan
Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG:
while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm
experience
issues with bluestore osds - so i canceled the upgrade and all
bluestore
osds are stopped now.
After starting a bluestore osd i'm seeing a lot of slow requests
caused
by very high read rates.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 45,00 187,00 767,00 39,00 482040,00 8660,00
1217,62 58,16 74,60 73,85 89,23 1,24 100,00
it reads permanently with 500MB/s from the disk and can't service
client
requests. Overall client read rate is at 10.9MiB/s rd
I can't reproduce this with 12.2.8. Is this a known bug / regression?
Greets,
Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com