Re: slow requests and high i/o / read rate on bluestore osds after upgrade 12.2.8 -> 12.2.10

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 16 Jan 2019 09:27:58 -0600

Hi Stefan,

12.2.9 included the pg hard limit patches and the osd_memory_autotuning 
patches.  While at first I was wondering if this was autotuning, it 
sounds like it may be more related to the pg hard limit.  I'm not 
terribly familiar with those patches though so some of the other members 
from the core team may need to take a look.

Mark

On 1/16/19 9:00 AM, Stefan Priebe - Profihost AG wrote:
This is not the case with 12.2.8 - it happens with 12.2.9 as well. After
boot all pgs are instantly active - not inactive pgs at least not
noticable in ceph -s.

With 12.2.9 or 12.2.10 or eben current upstream/luminous it takes
minutes until all pgs are active again.

Greets,
Stefan
Am 16.01.19 um 15:22 schrieb Stefan Priebe - Profihost AG:
Hello,

while digging into this further i saw that it takes ages until all pgs
are active. After starting the OSD 3% of all pgs are inactive and it
takes minutes after they're active.

The log of the OSD is full of:

2019-01-16 15:19:13.568527 7fecbf7da700  0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=184 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=185,(3+0)=2}}] _update_calc_stats ml 185 upset size 3 up 2
2019-01-16 15:19:13.568637 7fecbf7da700  0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=184 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=185,(3+0)=2}}] _update_calc_stats ml 2 upset size 3 up 3
2019-01-16 15:19:15.909327 7fecbf7da700  0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=184,(3+0)=3}}] _update_calc_stats ml 184 upset size 3 up 2
2019-01-16 15:19:15.909446 7fecbf7da700  0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=184,(3+0)=3}}] _update_calc_stats ml 3 upset size 3 up 3
2019-01-16 15:19:23.503231 7fecb97ff700  0 osd.33 pg_epoch: 1318479
pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747
21,1318474'61584855] local-lis/les=1318472/1318473 n=1912
ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131
8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1
rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec
overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1]
mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2

Greets,
Stefan
Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG:
Hi,

no ok it was not. Bug still present. It was only working because the
osdmap was so far away that it has started backfill instead of recovery.

So it happens only in the recovery case.

Greets,
Stefan

Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG:
Am 15.01.19 um 12:45 schrieb Marc Roos:

I upgraded this weekend from 12.2.8 to 12.2.10 without such issues
(osd's are idle)

it turns out this was a kernel bug. Updating to a newer kernel - has
solved this issue.

Greets,
Stefan

-----Original Message-----
From: Stefan Priebe - Profihost AG [mailto:s.priebe@xxxxxxxxxxxx]
Sent: 15 January 2019 10:26
To: ceph-users@xxxxxxxxxxxxxx
Cc: n.fahldieck@xxxxxxxxxxxx
Subject: Re:  slow requests and high i/o / read rate on
bluestore osds after upgrade 12.2.8 -> 12.2.10

Hello list,

i also tested current upstream/luminous branch and it happens as well. A
clean install works fine. It only happens on upgraded bluestore osds.

Greets,
Stefan

Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG:
while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm
experience
issues with bluestore osds - so i canceled the upgrade and all
bluestore
osds are stopped now.

After starting a bluestore osd i'm seeing a lot of slow requests
caused
by very high read rates.

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda              45,00   187,00  767,00   39,00 482040,00  8660,00
1217,62    58,16   74,60   73,85   89,23   1,24 100,00

it reads permanently with 500MB/s from the disk and can't service
client
requests. Overall client read rate is at 10.9MiB/s rd

I can't reproduce this with 12.2.8. Is this a known bug / regression?

Greets,
Stefan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com